lans_throwaway@alien.topB to LocalLLaMA@poweruser.forumEnglish · 10 months agoLook ahead decoding offers massive (~1.5x) speedup for inferencelmsys.orgexternal-linkmessage-square4fedilinkarrow-up11arrow-down10cross-posted to: localllama@poweruser.forum
arrow-up11arrow-down1external-linkLook ahead decoding offers massive (~1.5x) speedup for inferencelmsys.orglans_throwaway@alien.topB to LocalLLaMA@poweruser.forumEnglish · 10 months agomessage-square4fedilinkcross-posted to: localllama@poweruser.forum
minus-square_Lee_B_@alien.topBlinkfedilinkEnglisharrow-up1·10 months agoHmm, it looks like such a standard linear algebra optimisation that I’m surprised GPUs don’t do it automatically. But yep, looks good, either way.
Hmm, it looks like such a standard linear algebra optimisation that I’m surprised GPUs don’t do it automatically. But yep, looks good, either way.