Hi all,

Just curious if anybody knows the power required to make a llama server which can serve multiple users at once.

Any discussion is welcome:)

  • Prudent-Artichoke-19@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    One or two a6000s can serve a 70b with decent tps for 20 people. You can run a swarm using petals and just add a gpu as needed. LLM sharding can be pretty useful.