Could multiple 7b models outperform 70b models?

freehuntx@alien.top · 2 years ago

Could multiple 7b models outperform 70b models?

jxjq@alien.top · 2 years ago

Does this use of mixture-of-experts mean that multiple 70b models would perform ?better than multiple 7b models

vasileer@alien.top · 2 years ago

the question was if multiple small models can beat a single big model but also having the speed advantage, and answer is yes, and an example of that is MOE, which is a collection of small models all inside a single big model,

https://huggingface.co/google/switch-c-2048 is a such example

jxjq@alien.top · 2 years ago

Thank you for sharing, I understand now

extopico@alien.top · 2 years ago

big is an understatement. Please do correct me if I got it wildly wrong, but it appears to be a 3.6TB colossus.