[D]In transformer models, why is there a query and key matrix instead of just the product?

lildaemon@alien.top · 2 years ago

[D]In transformer models, why is there a query and key matrix instead of just the product?

CrazyCrab@alien.top · 2 years ago

On the Eleuther AI discord, someone once asked that question. And someone else replied that yeah, obviously having 1 matrix instead of 2 should be better in theory, but then, in practice, empirically, that makes things worse. Why? Noone knows.