Yi-23B-Llama: Distil version of Yi-34B-Llama

Covid-Plannedemic_@alien.top · 2 years ago

Yi-23B-Llama: Distil version of Yi-34B-Llama

llama_in_sunglasses@alien.top · 2 years ago

I had okayish results blowing up layers from 70b… but messing with the first or last 20% lobotomizes the model, and I didn’t snip more than a couple layers from any one place. By the time I got the model far enough down in size that q2_K could load in 24GB of VRAM it fell apart, so I didn’t consider mergekit all that useful of a distillation/parameter reduction process.

Yi-23B-Llama: Distil version of Yi-34B-Llama

Yi-23B-Llama: Distil version of Yi-34B-Llama

ByteWave/Yi-23B-Llama · Hugging Face