Holiday_Fly_590@alien.topB to LocalLLaMA@poweruser.forumEnglish · 1 year agoQuestions on Attention Sinks and Their Usage in LLM Modelsalien.topimagemessage-square5fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1imageQuestions on Attention Sinks and Their Usage in LLM Modelsalien.topHoliday_Fly_590@alien.topB to LocalLLaMA@poweruser.forumEnglish · 1 year agomessage-square5fedilink
minus-squareesotericloop@alien.topBlinkfedilinkEnglisharrow-up1·1 year agoSee, you’re attending to the initial token across all layers and heads. :P
See, you’re attending to the initial token across all layers and heads. :P