APaperADay@alien.topB to Machine Learning@academy.gardenEnglish · 1 year ago

[R] System 2 Attention (is something you might need too)

4

1

[R] System 2 Attention (is something you might need too)

APaperADay@alien.topB to Machine Learning@academy.gardenEnglish · 1 year ago

4

Paper: https://arxiv.org/abs/2311.11829

Abstract:

Soft attention in Transformer-based Large Language Models (LLMs) is susceptible to incorporating irrelevant information from the context into its latent representations, which adversely affects next token generations. To help rectify these issues, we introduce System 2 Attention (S2A), which leverages the ability of LLMs to reason in natural language and follow instructions in order to decide what to attend to. S2A regenerates the input context to only include the relevant portions, before attending to the regenerated context to elicit the final response. In experiments, S2A outperforms standard attention-based LLMs on three tasks containing opinion or irrelevant information, QA, math word problems and longform generation, where S2A increases factuality and objectivity, and decreases sycophancy.

Chat

reverendCappuccino@alien.topB
link
fedilink
English
arrow-up
1·
1 year ago
Well, it’s more like a psychological term, and attention is already there to illustrate the intended meaning of a dot product. The analogy holds up, so why doubting the validity of using system 2 attention rather than that of using attention at all?

Machine Learning@academy.garden

machinelearning@academy.garden

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !machinelearning@academy.garden

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

1 user / day
1 user / week
1 user / month
1 user / 6 months
1 local subscriber
1 subscriber
786 Posts
3.03K Comments
Modlog

mods:
communick@academy.garden