On top of what other said, make sure to include a few shot examples in your prompt, and consider using constrained decoding (ensuring you get valid json of whatever schema you provide, see pointers on how to do it with llama.cpp).
For few shotting chat models, append fake previous turns, like:
On top of what other said, make sure to include a few shot examples in your prompt, and consider using constrained decoding (ensuring you get valid json of whatever schema you provide, see pointers on how to do it with llama.cpp).
For few shotting chat models, append fake previous turns, like: