What kind of specs to run local llm and serve to say up to 20-50 users

Appropriate-Tax-9585@alien.top · 2 years ago

What kind of specs to run local llm and serve to say up to 20-50 users

Aggressive-Drama-899@alien.top · 2 years ago

We run llama 2 70b for around 20-30 active users using TGI and 4xA100 80gb on Kubernetes. If 2 users send a request at the exact same time, there is about a 3-4 second delay for the second user. Never really had any complaints around speed from people as of yet. We do have the ability to spin up multiple new containers if it became a problem though. This is all on prem

Appropriate-Tax-9585@alien.top · 2 years ago

Thank you, this is really good to hear!