Why is Mistral-7b so capable? Any ideas re: dataset?

Fun_Tangerine_1086@alien.top · 1 year ago

Why is Mistral-7b so capable? Any ideas re: dataset?

Monkey_1505@alien.top · 1 year ago

Having used it a lot, I can say for sure that without much prompting it readily produces junk web text, urls etc, so it is not a fully filtered or fully synthetic dataset.

My guess would be that it’s just ‘a bit better filtered than llama-2’, and maybe slightly more trained on that set. Slightly better quality set, slightly more trained on that set.

My intuition based on this, is that per parameter size EVERYTHING open source could be optimized considerably more.