[D] Why choose an H100 over an A100 for LLM inference?

faschu@alien.top · 2 years ago

[D] Why choose an H100 over an A100 for LLM inference?

norcalnatv@alien.top · 2 years ago

There was quite a detailed technical blog published when H100 was announced with plenty of of comparison to A100.

cyril1991@alien.top · 2 years ago

The H100 is more recent and beefier. It is also more interesting to use it for the multi-instance GPU (MIG) feature where you “split it” for use on different workloads, so you could run multiple LLMs in parallel. The A100 has the same feature, but less memory/compute to split.

I_will_delete_myself@alien.top · 2 years ago

A100 is like a 3070ti with 80gb Vram. H100 is like a 4090 with 80gb of ram and optimized hardware for transformers.

Annual-Minute-9391@alien.top · 2 years ago

Why a 3070ti? I would have guessed 3090? Something with clocks?

I_will_delete_myself@alien.top · 2 years ago

They have around the same amount of cuda cores. Normally the bigger the cuda cores the higher the inference

RobbinDeBank@alien.top · 2 years ago

More tensor and cuda cores mean higher inference and training speed right? Do inference and training get the same benefit from those cores?

redditfriendguy@alien.top · 2 years ago

I don’t think fp8 is a real thing

Substantial-Job1405@alien.top · 2 years ago

From my personal experience, I think h100 provides better performance when it comes to Low Level Machine Learning. The data processing speed is significantly faster compared to the a100, which can make a big difference when it comes to projects that take time to compete.

pm_me_your_pay_slips@alien.top · 2 years ago

A100s and H100s are great for training, but a bit of a waste for inference.

SnooHesitations8849@alien.top · 2 years ago

H100 and A100 are best for training. H100 is optimized for lower precision (8/16 bits) and optimized for transformer. A100 is still very good but not that much. A100 is still very GPU-like. Wwhile H100 is a transformer-accelerator.

Using them for inference is not the best econ-friendly though.

3DHydroPrints@alien.top · 2 years ago

H100 was additionally specialized to have higher performance for transformer models. I think it is about 8x faster than a A100 for transformers, but don’t nail me down on it

Gurrako@alien.top · 2 years ago

At first I thought that number was almost unbelievably high. It appears it can be 8x faster when using FlashAttention and a multi-GPU setup. Without multi-gpu and flash attention, it is a bit more than 2x faster.

Source: https://lambdalabs.com/blog/flashattention-2-lambda-cloud-h100-vs-a100

3DHydroPrints@alien.top · 2 years ago

Thanks for clarifying:)

TwistedBrother@alien.top · 2 years ago

Sure but isn’t it the case that the H100 is what can sustain such a high throughput system whereas A100s are generally independent?