NVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLM

rihard7854@alien.top · 2 years ago

NVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLM

aliencaocao@alien.top · 2 years ago

Batchsize 1024 though…not for personal use case

Herr_Drosselmeyer@alien.top · 2 years ago

Obviously. There aren’t many people in the world with 50k burning a hole in their pockets and of those, even fewer are nerdy enough to want to set up their own AI server in their basement just for themselves to tinker with.