on the hugging face leaderboard, i was a bit surprised by the performance of falcon 180b.
do you have any explanation of how?
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
You must log in or register to comment.
Public leaderboards mean nothing because 99% of the finetuned models are overfitted to hell, its like nobody ever did a kaggle comp before
I think a big obstacle is that it is so big hardly anyone is trying to fine tune it.
Well, the model is trained on refinedWeb, which is 3.5T, so a little below chinchilla optimal for 180b. Also, all the models from the falcon series seem to feel more and more undertrained,
- The 1b model was good, and is still good after several newer gens
- the 7b was capable pre llama 2
- 40b and 180b were never as good
These leaderboards are dick measuring contests for small dicks. Imagine the dynamics of that.
Falcon-180B is not good