Training on the rephrased test set is all you need: 13B models can reach GPT-4 performance in benchmarks with no contamination detectable by traditional methods

Covid-Plannedemic_@alien.top · 1 year ago

Training on the rephrased test set is all you need: 13B models can reach GPT-4 performance in benchmarks with no contamination detectable by traditional methods

shibe5@alien.top · 1 year ago

With the abundance of models, most developers and users have to select a small subset of available models for own evaluation, and that has to be based on some already available data about models’ performance. At that stage, selecting models with, for example, highest MMLU score is one way to go about it.

Training on the rephrased test set is all you need: 13B models can reach GPT-4 performance in benchmarks with no contamination detectable by traditional methods

Training on the rephrased test set is all you need: 13B models can reach GPT-4 performance in benchmarks with no contamination detectable by traditional methods

Catch me if you can! How to beat GPT-4 with a 13B model | LMSYS Org