I’m planning to fine-tune a mistral model with my own dataset. (full fine-tune, not LORAs)
The dataset is not that large, around 120 mb in jsonl format.
My questions are:
- Will I be able to fine-tune the model with 4 cards of 40G A100?
- If not, is using runpod the easiest approach?
- I’m trying to instill knowledge in a certain language, for a field it does not have sufficient knowledge in said language. Is fine-tuning my only option? RAG is not viable in my case.
Thanks in advance!
how much does it cost to do these fine tunes on RunPod? How much compute time is used
Lik $1000+?
you get 14hrs of a100 80gb with 25 dollars.
An A100 ((80GB) costs between $1.70=$1.99 per hour on RunPod. How long you need depends on dataset size, sequence length, the optimizer you choose and how many epochs you train for. I can get a full finetune of Mistral (5 epochs) with an Adam 8-bit optimizer done on my small (1300 samples) but long sequence length (most samples are 4096 tokens) dataset in around an hour with 3x A100s.