rihard7854@alien.topB to LocalLLaMA@poweruser.forumEnglish · 10 months agoNVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLMgithub.comexternal-linkmessage-square23fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkNVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLMgithub.comrihard7854@alien.topB to LocalLLaMA@poweruser.forumEnglish · 10 months agomessage-square23fedilink
minus-squarealiencaocao@alien.topBlinkfedilinkEnglisharrow-up1·10 months agoBatchsize 1024 though…not for personal use case
minus-squareHerr_Drosselmeyer@alien.topBlinkfedilinkEnglisharrow-up1·10 months agoObviously. There aren’t many people in the world with 50k burning a hole in their pockets and of those, even fewer are nerdy enough to want to set up their own AI server in their basement just for themselves to tinker with.
Batchsize 1024 though…not for personal use case
Obviously. There aren’t many people in the world with 50k burning a hole in their pockets and of those, even fewer are nerdy enough to want to set up their own AI server in their basement just for themselves to tinker with.