Training on the rephrased test set is all you need: 13B models can reach GPT-4 performance in benchmarks with no contamination detectable by traditional methods

Covid-Plannedemic_@alien.top · 1 year ago

Training on the rephrased test set is all you need: 13B models can reach GPT-4 performance in benchmarks with no contamination detectable by traditional methods

SlowSmarts@alien.top · 1 year ago

Huh…I figured this has already been happening for a while on closed dataset LLMs. The leaderboard has not directly indicated a models ability to do real-world work from my experience. Some of the lower ranking models seem to do better with what I put them through than the top ranking models. Just my personal opinion and observation.

Training on the rephrased test set is all you need: 13B models can reach GPT-4 performance in benchmarks with no contamination detectable by traditional methods

Training on the rephrased test set is all you need: 13B models can reach GPT-4 performance in benchmarks with no contamination detectable by traditional methods

Catch me if you can! How to beat GPT-4 with a 13B model | LMSYS Org