[D] Why choose an H100 over an A100 for LLM inference?

faschu@alien.top · 2 years ago

[D] Why choose an H100 over an A100 for LLM inference?

3DHydroPrints@alien.top · 2 years ago

H100 was additionally specialized to have higher performance for transformer models. I think it is about 8x faster than a A100 for transformers, but don’t nail me down on it

Gurrako@alien.top · 2 years ago

At first I thought that number was almost unbelievably high. It appears it can be 8x faster when using FlashAttention and a multi-GPU setup. Without multi-gpu and flash attention, it is a bit more than 2x faster.

Source: https://lambdalabs.com/blog/flashattention-2-lambda-cloud-h100-vs-a100

3DHydroPrints@alien.top · 2 years ago

Thanks for clarifying:)

TwistedBrother@alien.top · 2 years ago

Sure but isn’t it the case that the H100 is what can sustain such a high throughput system whereas A100s are generally independent?