[D] Why choose an H100 over an A100 for LLM inference?

faschu@alien.top · 2 years ago

[D] Why choose an H100 over an A100 for LLM inference?

Gurrako@alien.top · 2 years ago

At first I thought that number was almost unbelievably high. It appears it can be 8x faster when using FlashAttention and a multi-GPU setup. Without multi-gpu and flash attention, it is a bit more than 2x faster.

Source: https://lambdalabs.com/blog/flashattention-2-lambda-cloud-h100-vs-a100

3DHydroPrints@alien.top · 2 years ago

Thanks for clarifying:)

TwistedBrother@alien.top · 2 years ago

Sure but isn’t it the case that the H100 is what can sustain such a high throughput system whereas A100s are generally independent?