Hi all,
Just curious if anybody knows the power required to make a llama server which can serve multiple users at once.
Any discussion is welcome:)
Hi all,
Just curious if anybody knows the power required to make a llama server which can serve multiple users at once.
Any discussion is welcome:)
Hugging face text inference can handle concurrency you just need to power with gpus