ClassroomGold6910@alien.topOPBtoLocalLLaMA@poweruser.forum•Cheapest way to run local LLMs?English
1·
1 year ago3b’s work amazingly and super smoothly but 7b models while running at a fair 15 tokens per second prevent me from using any other application at the same time and occasionally freeze my mouse and screen temporarily until the response is finished
What’s the difference between `K_M` models, also why is `Q_4` legacy but not `Q_4_1`, it would be great if someone could explain that lol