I’m using llama models for local inference with Langchain , so i get so much hallucinations with GGML models i used both LLM and chat of ( 7B, !3 B) beacuse i have 16GB of RAM.
So Now i’m exploring new models and want to get a good model , should i try GGUF format ??
Kindly give me suggestions if someone using Local models with langchain at production level .
You must log in or register to comment.
GGUF won’t change the level of hallucination, but you are right that most newer language models are quantized to GGUF, so it makes sense to use one.
ggml is totally deprecated, so much so that the make-ggml.py script in llama.cpp now makes ggufs