I’m using ollama and I have a RTX 3060 TI. Using only 7B models.
I tested with Mistral 7B, Mistral-OpenOrca and Zephyr, they all had the same problem where they kept repeating or speaking randomly after some amount of chatting.
What could it be? Temperature? VRAM? ollama?
I had this using other clients, try lm studio with a gguf and chatml, works well for me.