Can be done with a Mattermost self-hosted server https://github.com/mattermost/openops
Can be done with a Mattermost self-hosted server https://github.com/mattermost/openops
Interested to know how it scores for RAG use cases, there is a benchmark for that https://github.com/vectara/hallucination-leaderboard
Up to now, Mistral underperforms Llama2.
Llama.cpp supports batched inference since 4 weeks https://github.com/ggerganov/llama.cpp/issues/2813
-cb, --cont-batching enable continuous batching (a.k.a dynamic batching) (default: disabled)
Yesterday I tried GPT4All, and it references context by outputting 3 passages from my local documents. I could click on each of them and read the passage. But their implementation is only using some algorithm at the moment. Embedding based on semantic search is still on their roadmap.
I think batched inference is a must for companies who want to put an on-premise chatbot in front of their users. This is a use case many are busy with at the moment. I saw llama.cpp now supports batched inference, only since 2 weeks, I don’t have hands-on experience with it yet.
Unethical practices, one-man-shops attempting to pump up the account value artificially, aiming for a sale later on.