I heard about this on Twitter from some people in the field, in relation to OpenAI’s new breakthrough.

Is there a summary paper, like the ‘All you need is attention’ paper, that goes over this?

Also, how specifically does this relate to and/or add on to Large Language Models?

Cheers