Thanks for this feedback, what is your definition of an on-prem chatbot? Hosted on their physical infrastructure?
Is home hardware a requirement for this project? I guess I’m a little confused what that has to do with model hallucinations.
I just wrote a tutorial on how you can scale Mistral-7b to many GPUs in the cloud. I hope this can give you some value. Not sure if you’re looking to do on-demand inference or inference on a bunch of inputs.
https://www.reddit.com/r/LocalLLaMA/comments/17k2x62/i_scaled_mistral_7b_to_200_gpus_in_less_than_5/
This is really cool! We are more focused on lengthy workloads so running 500k inputs through an LLM in one batch instead of on-demand inference (starting to support this). Right now the startup time is pretty long (2-5 minutes) but we are working on cutting it down.
This is really useful feedback, I’d definitely be able to produce a revenue generating product faster if I focus on chatbots… so in terms of trying to get funding for this idea that seems to be the better avenue. In the future I could definitely address both use cases but trying not to spread myself too thin at the moment.