@mgostIH - Communick News

0 Posts
2 Comments

Joined 1 year ago

Cake day: October 28th, 2023

You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.

OverviewCommentsPosts

mgostIH@alien.topBtoMachine Learning@academy.garden•[R] Exponentially Faster Language Modelling
link
fedilink
English
arrow-up
1·
1 year ago
Previous discussion of Fast Feed Forward, which is a paper from the author this one is based on.

link
fedilink

mgostIH@alien.topBtoMachine Learning@academy.garden•[R] Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers
link
fedilink
English
arrow-up
1·
1 year ago

because there’s no obvious way to parallelize the causal self-attention with a FF

You can just use triangular matrices, autoregressive language modelling can be done even with linear only layers. See page 12 of https://arxiv.org/abs/2309.06979

link
fedilink