minus-squarepablines@alien.topBtoLocalLLaMA@poweruser.forum•What kind of specs to run local llm and serve to say up to 20-50 userslinkfedilinkEnglisharrow-up1·10 months agoHugging face text inference can handle concurrency you just need to power with gpus linkfedilink
minus-squarepablines@alien.topBtoLocalLLaMA@poweruser.forum•Rocket 🦝 - smol model that overcomes models much larger in sizelinkfedilinkarrow-up1·10 months agoWoooooooow! linkfedilink
Hugging face text inference can handle concurrency you just need to power with gpus