on-demand inference or batch inference?

Ok_Post_149@alien.top · 1 year ago

on-demand inference or batch inference?

AdamDhahabi@alien.top · 1 year ago

I think batched inference is a must for companies who want to put an on-premise chatbot in front of their users. This is a use case many are busy with at the moment. I saw llama.cpp now supports batched inference, only since 2 weeks, I don’t have hands-on experience with it yet.

Ok_Post_149@alien.top · 1 year ago

Thanks for this feedback, what is your definition of an on-prem chatbot? Hosted on their physical infrastructure?

matkley12@alien.top · 1 year ago

Does llama.cpp support batch inference on CPU ?