Yi-23B-Llama: Distil version of Yi-34B-Llama

Covid-Plannedemic_@alien.top · 2 years ago

Yi-23B-Llama: Distil version of Yi-34B-Llama

sergeant113@alien.top · 2 years ago

Can’t wait!!!

mcmoose1900@alien.top · 2 years ago

There are in fact 3 different distillations: https://huggingface.co/collections/ByteWave/distil-yi-models-655a5697ec17c88302ce7ea1

Its not the 200K model though.

a_beautiful_rhind@alien.top · 2 years ago

Which is a shame because the same performance + the extra context would have been huge.

kristaller486@alien.top · 2 years ago

Is there a code for distillation?

llama_in_sunglasses@alien.top · 2 years ago

I had okayish results blowing up layers from 70b… but messing with the first or last 20% lobotomizes the model, and I didn’t snip more than a couple layers from any one place. By the time I got the model far enough down in size that q2_K could load in 24GB of VRAM it fell apart, so I didn’t consider mergekit all that useful of a distillation/parameter reduction process.

mcmoose1900@alien.top · 2 years ago

Oh yeah, it be busted.

roselan@alien.top · 2 years ago

and of course TheBloke already prepped everything for our fine consumption.

LocoMod@alien.top · 2 years ago

Had the same problem last night and I promptly deleted it.

mpasila@alien.top · 2 years ago

Did anyone manage to get them working? I tried GGUF/GPTQ and running then unquantized with trust-remote-code and they just produced garbage. (I did try removing BOS tokens and still same thing)

Jelegend@alien.top · 2 years ago

Yeah, exactly the same thing. Produced absolutely rubbish whatever i tried. I tried 8B 15B and 23B

watkykjynaaier@alien.top · 2 years ago

I’ve completely fixed gibberish output on Yi-based and other models by setting the RoPE Frequency Scale to a number less than one, which seems to be the default. I have no idea why that works, but it does.

What I find even more strange is the models often keep working after setting the frequency scale back to 1.

Aaaaaaaaaeeeee@alien.top · 2 years ago

What value specifically worked?

vasileer@alien.top · 2 years ago

did you test the model before advertising it?

bearbarebere@alien.top · 2 years ago

Lmao

ltduff69@alien.top · 2 years ago

I haven’t had any issues running these Yi models. I think they are really good personally.

https://preview.redd.it/xddjserqii1c1.jpeg?width=3024&format=pjpg&auto=webp&s=bd9b3124954ff5d6a7c3452b857949d8363c9e87

No_Afternoon_4260@alien.top · 2 years ago

You took a picture of nous capybara…

ltduff69@alien.top · 2 years ago

Yeah I am kinda petty lol.

Yi-23B-Llama: Distil version of Yi-34B-Llama

Yi-23B-Llama: Distil version of Yi-34B-Llama

ByteWave/Yi-23B-Llama · Hugging Face