Joined 1 year ago

Cake day: November 14th, 2023

You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.

OverviewCommentsPosts

programmerChilli@alien.top

GPT-Fast: A fast and hackable implementation of transformer inference in <1000 lines of native PyTorch with support for quantization, speculative decoding, TP, Nvidia/AMD support, and more!

GPT-Fast: A fast and hackable implementation of transformer inference in &lt;1000 lines of native PyTorch with support for quantization, speculative decoding, TP, Nvidia/AMD support, and more!