minus-squarealiencaocao@alien.topBtoLocalLLaMA@poweruser.forum•NVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLMlinkfedilinkEnglisharrow-up1·10 months agoBatchsize 1024 though…not for personal use case linkfedilink
Batchsize 1024 though…not for personal use case