According to this tweet,
when gpt4 first finished training it didn’t actually work very well and the whole team thought it’s over, scaling is dead…until greg went into a cave for weeks and somehow magically made it work
So gpt-4 was kind of broken at first. Then greg spent a few weeks trying to fix it and then it somehow worked.
So why did it not work at first and how did they fix it?
I think this is an important question to the OSS community,
They shot a text to Jensen over at Nvidia and he gave them is contact from a few galaxies over. And the nice beings walked them through it.