Yi-23B-Llama: Distil version of Yi-34B-Llama

Covid-Plannedemic_@alien.top · 2 years ago

Yi-23B-Llama: Distil version of Yi-34B-Llama

mpasila@alien.top · 2 years ago

Did anyone manage to get them working? I tried GGUF/GPTQ and running then unquantized with trust-remote-code and they just produced garbage. (I did try removing BOS tokens and still same thing)

Jelegend@alien.top · 2 years ago

Yeah, exactly the same thing. Produced absolutely rubbish whatever i tried. I tried 8B 15B and 23B

watkykjynaaier@alien.top · 2 years ago

I’ve completely fixed gibberish output on Yi-based and other models by setting the RoPE Frequency Scale to a number less than one, which seems to be the default. I have no idea why that works, but it does.

What I find even more strange is the models often keep working after setting the frequency scale back to 1.

Aaaaaaaaaeeeee@alien.top · 2 years ago

What value specifically worked?

Yi-23B-Llama: Distil version of Yi-34B-Llama

Yi-23B-Llama: Distil version of Yi-34B-Llama

ByteWave/Yi-23B-Llama · Hugging Face