rihard7854@alien.topB to LocalLLaMA@poweruser.forumEnglish · 10 months agoNVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLMgithub.comexternal-linkmessage-square23fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkNVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLMgithub.comrihard7854@alien.topB to LocalLLaMA@poweruser.forumEnglish · 10 months agomessage-square23fedilink
minus-squarea_beautiful_rhind@alien.topBlinkfedilinkEnglisharrow-up1·10 months ago70b with 2048 context and 128 reply is about 303 t/s. That sounds more reasonable. And assuming they aren’t quantized. The batch size is just theoretical batch I think.
70b with 2048 context and 128 reply is about 303 t/s.
That sounds more reasonable. And assuming they aren’t quantized. The batch size is just theoretical batch I think.