I’m using a100 pcie 80g. Cuda11.8 toolkit 525.x
But when i inference codellama 13b with oobabooga(web ui)
It just make 5tokens/s
It is so slow.
Is there any config or something else for a100???
I’m using a100 pcie 80g. Cuda11.8 toolkit 525.x
But when i inference codellama 13b with oobabooga(web ui)
It just make 5tokens/s
It is so slow.
Is there any config or something else for a100???
Uhmmm where did you buy that a100? Was it a good deal? lol. Just kidding, you probably set sth up wrong or the drivers are messing up. Is the card working fine otherwise in benchmarks?