For the public leaderboard in LLM, they tested of MLMU, ARC that kinds of dataset. What happen if I simply train my LLM on test set, how do you know I did that? I will get a model that rank high in the public leaderboard right?
For the public leaderboard in LLM, they tested of MLMU, ARC that kinds of dataset. What happen if I simply train my LLM on test set, how do you know I did that? I will get a model that rank high in the public leaderboard right?
There is a fair chance that a lot of LLMs already do this, just not only the test set but also other data.