Hi all,

Just curious if anybody knows the power required to make a llama server which can serve multiple users at once.

Any discussion is welcome:)

  • pablines@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Hugging face text inference can handle concurrency you just need to power with gpus