Hi all,
Just curious if anybody knows the power required to make a llama server which can serve multiple users at once.
Any discussion is welcome:)
Hi all,
Just curious if anybody knows the power required to make a llama server which can serve multiple users at once.
Any discussion is welcome:)
One or two a6000s can serve a 70b with decent tps for 20 people. You can run a swarm using petals and just add a gpu as needed. LLM sharding can be pretty useful.