M1/M2/M3: increase VRAM allocation with `sudo sysctl iogpu.wired_limit_mb=12345` (i.e. amount in mb to allocate)

farkinga@alien.top · 2 years ago

M1/M2/M3: increase VRAM allocation with `sudo sysctl iogpu.wired_limit_mb=12345` (i.e. amount in mb to allocate)

Zestyclose_Yak_3174@alien.top · 2 years ago

Which quant did you use and how was your experience?

CheatCodesOfLife@alien.top · 2 years ago

46G goliath-120b.Q2_K

So the smallest one I found (I didn’t quantize this one myself, found it on HF somewhere)

And it was very slow. about 13t/s prompt_eval and then 2.5t/s generating text, so only really useful for me when I need to run it on my laptop (I get like 15t/s with 120b model on my 2x3090 rig at 3bpw exl2)
As for the models it’s self, I like it a lot and use it frequently.

TBH, this ram thing is more helpful for me because it lets me run Q5 70b models instead of just Q4 now.

M1/M2/M3: increase VRAM allocation with sudo sysctl iogpu.wired_limit_mb=12345 (i.e. amount in mb to allocate)

M1/M2/M3: increase VRAM allocation with sudo sysctl iogpu.wired_limit_mb=12345 (i.e. amount in mb to allocate)

M1/M2/M3: increase VRAM allocation with `sudo sysctl iogpu.wired_limit_mb=12345` (i.e. amount in mb to allocate)

M1/M2/M3: increase VRAM allocation with `sudo sysctl iogpu.wired_limit_mb=12345` (i.e. amount in mb to allocate)