Is Open LLM Leaderboard reliable source ? yi:34B is at the top but I get better results with neural-chat:7B model

grigio@alien.top · 1 year ago

ThisGonBHard@alien.top · 1 year ago

While the benchmarks then to be cheated, especially by small models, I honestly think something is wrong with how you run it.

Yi-34B trades blows with Lllama 2 70B from my personal tests, making it do novel tasks invented by me, not the gamed benchmarks.

ALL 7B models are like putting a 7 year old vs an renowned professor when they are compared to 34B and 70B.