Could multiple 7b models outperform 70b models?

freehuntx@alien.top · 1 year ago

Could multiple 7b models outperform 70b models?

feynmanatom@alien.top · 1 year ago

Lots of rumors, but tbh I think it’s highly unlikely they’re using an MoE. MoEs work on batch size = 1 (you can take advantage of sparsity) but not on larger batch sizes. You would need so much RAM and would miss out on the point of using an MoE.

remghoost7@alien.top · 1 year ago

Lots of rumors…

Very true.

We honestly have no clue what’s going on behind ClosedAI’s doors.

I don’t know enough about MoEs to say one way or the other, so I’ll take your word on it. I’ll have to do more research on them.