Radiant-Practice-270@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 2 years ago

Why is a single a100 so slow?

8

1

Why is a single a100 so slow?

Radiant-Practice-270@alien.topB to

LocalLLaMA@poweruser.forumEnglish · 2 years ago

8

I’m using a100 pcie 80g. Cuda11.8 toolkit 525.x

But when i inference codellama 13b with oobabooga(web ui)

It just make 5tokens/s

It is so slow.

Is there any config or something else for a100???

Chat

uti24@alien.topB
link
fedilink
English
arrow-up
1·
2 years ago
Sounds like you run it on CPU. If you using oobabooga you have to explicitly set how many layers you offload to GPU and by default everything runs on CPU (at least gguf models)