mrfox321@alien.topBtoMachine Learning@academy.garden•[D]In transformer models, why is there a query and key matrix instead of just the product?English
1·
1 year agoIf you keep the matrices separate, you can control the rank of the learned weights.
Otherwise, the (single) matrix will be full rank.
Your answer is also terrible. It does not answer his question.
Look at the top 2 replies to see correct interpretations of the question.