lengyue233@alien.topBtoLocalLLaMA@poweruser.forum•NVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLMEnglish
1·
10 months agoAre u going to talk with Yuki at 1024 batch size?
Are u going to talk with Yuki at 1024 batch size?
Yes, batch size is intended for multiple sessions (1024 parallel sessions in the case).