How much more stupid is the 120B goliath Q3_K_M than the larger options?

Secret_Joke_2262@alien.top · 1 year ago

How much more stupid is the 120B goliath Q3_K_M than the larger options?

vikarti_anatra@alien.top · 1 year ago

How you actually use it at home ? 3 or 4 old Test P40 from ebuy/local alternatives? Just CPU?

Murky-Ladder8684@alien.top · 1 year ago

4x3090s will run it at over 4bits.

SomeOddCodeGuy@alien.top · 1 year ago

I imagine it’s pretty solid.

I’ve tested around the with q4_K_M and the q8 on my Mac Studio, and the q4 is pretty darn good. There’s some difference in that the q4 does seem to get confused when I talk to it sometimes, whereas the q8 seems unshakeable in its quality, but honestly the q4 still feels better than almost any other model I’ve ever used.

quaquaversal_@alien.top · 1 year ago

What’s the tok/s for each of those models on that system?

Edit: also, if you don’t mind my asking, how much context are you able to use before inference degrades?

Murky-Ladder8684@alien.top · 1 year ago

for comparison sake EXL2 4.85bpw version runs around 6-8 t/s on 4x3090s at 8k context it’s the lower end.

carwall00@alien.top · 1 year ago

What software are you using to run Goliath on? Using a Mac Studio I get errors in LM Studio and text-generation-webui and can’t load any of the blokes gguf’s for it. I can only load the 2bit from the creator. Any help would be much appreciated. Thanks!