Cheapest way to run local LLMs?

ClassroomGold6910@alien.top · 1 year ago

What’s the difference between `K_M` models, also why is `Q_4` legacy but not `Q_4_1`, it would be great if someone could explain that lol

ClassroomGold6910@alien.top · 1 year ago

3b’s work amazingly and super smoothly but 7b models while running at a fair 15 tokens per second prevent me from using any other application at the same time and occasionally freeze my mouse and screen temporarily until the response is finished

ClassroomGold6910@alien.top · 1 year ago

20 tok/s seems like the minimum I would be sane with lol

ClassroomGold6910@alien.top · 1 year ago

Cheapest way to run local LLMs?