minus-squaretoo_long_story@alien.topBtoMachine Learning@academy.garden•[R] Break the Sequential Dependency of LLM Inference Using Lookahead DecodinglinkfedilinkEnglisharrow-up1·10 months agoWell, but how to merry it with batching so that flash attention kernels can work with it? Any complicated masks for attention imply hard times of making possible supporting batching. linkfedilink
Well, but how to merry it with batching so that flash attention kernels can work with it?
Any complicated masks for attention imply hard times of making possible supporting batching.