Why is no one releasing 70b models?

Longjumping-Bake-557@alien.top · 1 year ago

Why is no one releasing 70b models?

thereisonlythedance@alien.top · 1 year ago

I’ve been training a lot lately, mostly on RunPod, a mix of fine-tuning Mistral 7B and training LoRA and QLoRAs on 34B and 70Bs. My main takeaway is that the LoRA outcomes are just… not so great. Whereas I’m very happy with the Mistral fine-tunes.

I mean, it’s fantastic we can tinker with a 70B at all, but it doesn’t matter how good your dataset is, you just can’t have the same impact as you can with a full finetune. I think this is why model merging/frankensteining has become popular, it’s an expression of the limitations of LoRA training.

Personally, I have high hopes for a larger Mistral model (in the 13-20B range) that we can still do a full fine-tune on. Right now, between my own specific tunes of Mistral and some of the recent external tunes like Starling I feel like I’m close to having the tools I want/need. But Mistral is still 7B, it doesn’t matter how well it’s tuned, it will still get a little muddled at times, particular with longer term dependencies.

Armym@alien.top · 1 year ago

Do you think that finetuning models with more parameters requires more data to actually do something?

thereisonlythedance@alien.top · 1 year ago

With a full finetune I don’t think so – the LIMA paper showed that 1000 high quality samples is enough with a 65B model. With QLoRA and LoRA, I don’t know. The number of parameters you’re affecting is set by the rank you choose. It’s important to get the balance between the rank, dataset size, and learning rate right. Style and structure is easy to impart, but other things not so much. I often wonder how clean the merge process actually is. I’m still learning.

Vilzuh@alien.top · 1 year ago

I have been trying to learn about fine-tuning and lora training for the past couple weeks but I’m having trouble finding easy enough resources to learn from. Could you give me some pointers to what I can read to get started with finetuning llama2 or mistral?

I have tried training quantized models locally with oobabooga and llama.cpp and I also have access to runpod. Really appreciate any info!