On top of what other said, make sure to include a few shot examples in your prompt, and consider using constrained decoding (ensuring you get valid json of whatever schema you provide, see pointers on how to do it with llama.cpp).
For few shotting chat models, append fake previous turns, like:
System:
User:
Assistant:
...
User:
Assistant:
User:
It’s inevitable people will game the system when it’s so easy, and the payoff can be huge. Not so long ago people could still get huge VC checks for showing off GitHub stars or benchmark numbers.
Curious to hear what other UIs people use and for what purpose / what they like about each (like Oogabooga, or Kobold).
I can recommend vLLM. Also offers OpenAI compatible API service, if you want that.
Thank you so much for the kind feedback! If you have found some cool prompts, come share them with others on our discord.
I hope it will be something tasty! :)
The training data had example of up to 4096 tokens. The model should also work beyond that, but I did not do a deep analysis of degradation.
I agree, I hope I can make things cheaper with better utilization. You have to consider that a single GPU is not used 100% the time, so there’s a lot of waste. And due to lack of scale, I also do not get any special pricing on the GPUs. The more users, the closer the utilization will be to 100%, and the better GPU pricing. (For instance, I heard that on Google Cloud, enterprise customers can negotiate the on-demand GPU price down to the regular spot price for some of the GPUs)
Wow, amazing, thanks for giving it a try GGUF and other quants are coming, so your computer should have an easier time soon! :)
What’s the maximum possible dead babies score? :D
Thank you!
Great news, the great /u/TheBloke is working on this!
I have been using the Python API client 1.0 preview version (which was just released) for some time with vLLM OpenAI compatible server and it worked well – at least I did not notice any issues.
There was a bug on the website where the first time the “Continue” would not work if you did not refresh, should work now even though the editor is quite janky still, sorry for that :(
(can’t wait for AI to take over React from me :P)
Found a live stream on YouTube, for anyone interested: https://www.youtube.com/watch?v=o35EY8I9PXU