I maintain the uniteai project, and have implemented a custom backend for serving transformers
-compatible LLMs. (That file’s actually a great ultra-light-weight server if transformers
satisfies your needs; one clean file).
I’d like to add GGML etc, and I haven’t reached for cTransformers
. Instead of building a bespoke server, it’d be nice if a standard was starting to emerge.
For instance, many models have custom instruct templates
, which, if a backend handles all that for me, that’d be nice.
I’ve used llama.cpp
, but I’m not aware of it handling instruct templates
. Is that worth building on top of? It’s not too llama-only focused? Production worthy? (it bills itself as “mainly for educational purposes”).
I’ve considered oobabooga
, but I would just like a best-in-class server, without all the other FE fixings and dependencies.
Is OpenAI’s API signature something people are trying to build against as a standard?
Any recommendations?
Disclosure : I’m the maintainer of nitro project
We have a simple llama server with just single binary that you can download try right away here https://github.com/janhq/nitro it will be a viable option if you want to set up an openai compatible endpoint to test out new model
I think all frameworks support custom instruct templates, and know for a fact llama.cpp does due to my use of StudioLM, based on llama.cpp, in which I can alter the system / user / assistant templates.