@eigenham - Communick News

0 Posts
1 Comment

Joined 11 months ago

Cake day: October 20th, 2023

You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.

OverviewCommentsPosts

eigenham@alien.topBtoMachine Learning@academy.garden•[D]In transformer models, why is there a query and key matrix instead of just the product?
link
fedilink
English
arrow-up
1·
10 months ago
I would suggest looking into the math a little more. I think all of the matrices in the attention layer are a (linear) function of the input sequence. So the output of the attention layer is softmax of a quadratic of the input iirc

link
fedilink