Radiant-Practice-270@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

Why is a single a100 so slow?

8

1

Why is a single a100 so slow?

Radiant-Practice-270@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

8

I’m using a100 pcie 80g. Cuda11.8 toolkit 525.x

But when i inference codellama 13b with oobabooga(web ui)

It just make 5tokens/s

It is so slow.

Is there any config or something else for a100???

Chat

a_beautiful_rhind@alien.topB
link
fedilink
English
arrow-up
1·
1 year ago
Something is wrong with your environment. even P40s give more than that.

Other option is you don’t get enough tokens to get proper t/s speed. What was the total inference time?