Announcing Llama-rephraser: 13B models reaching GPT-4 performance in major benchmarks (MMLU/GSK-8K/HumanEval)!
To ensure result validity, we followed Open...
Huh…I figured this has already been happening for a while on closed dataset LLMs. The leaderboard has not directly indicated a models ability to do real-world work from my experience. Some of the lower ranking models seem to do better with what I put them through than the top ranking models. Just my personal opinion and observation.
Huh…I figured this has already been happening for a while on closed dataset LLMs. The leaderboard has not directly indicated a models ability to do real-world work from my experience. Some of the lower ranking models seem to do better with what I put them through than the top ranking models. Just my personal opinion and observation.