@tdgros - Communick News

0 Posts
1 Comment

Joined 1 year ago

Cake day: October 27th, 2023

You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.

OverviewCommentsPosts

tdgros@alien.topBtoMachine Learning@academy.garden•[D]In transformer models, why is there a query and key matrix instead of just the product?
link
fedilink
English
arrow-up
1·
1 year ago
It’s the same mathematically but not computation wise, the tokens are projected to a smaller dimension. The complexity is 2Nd whereas it’d be N² if you’d fuse the weight matrices.

link
fedilink