programmerChilli@alien.topB to LocalLLaMA@poweruser.forumEnglish · 11 months agoGPT-Fast: A fast and hackable implementation of transformer inference in <1000 lines of native PyTorch with support for quantization, speculative decoding, TP, Nvidia/AMD support, and more!plus-squaremessage-squaremessage-square1fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1message-squareGPT-Fast: A fast and hackable implementation of transformer inference in <1000 lines of native PyTorch with support for quantization, speculative decoding, TP, Nvidia/AMD support, and more!plus-squareprogrammerChilli@alien.topB to LocalLLaMA@poweruser.forumEnglish · 11 months agomessage-square1fedilink