How come llama2 70B is so much worse than many other code-llama 34B?
I’m not talking specifically for coding questions but the 70B seems utterly stupid… repeats nonsense patterns, starts talking of unrelated stuff and sometimes get stuck in a loop of repeating the same word. Seems utter garbage and I downloaded the official model from the meta HF.
Has anyone experienced the same? Am I doing something wrong with the 70B model?
No I didn’t even know rope was a thing, I’m reading about it now… if you have any tl;dr please post it, this stuff seems pretty complicated.
I was loading the model with a llama.cpp invocation, didn’t know about rope. What would change if I left the default values on?