M1/M2/M3: increase VRAM allocation with `sudo sysctl iogpu.wired_limit_mb=12345` (i.e. amount in mb to allocate)

farkinga@alien.top · 10 months ago

M1/M2/M3: increase VRAM allocation with `sudo sysctl iogpu.wired_limit_mb=12345` (i.e. amount in mb to allocate)

farkinga@alien.top · 10 months ago

Yeah! That’s what I’m talking about. Would you happen remember what it was reporting before? If it’s like the rest, I’m assuming it said something like 40 or 45gb, right?

bebopkim1372@alien.top · 10 months ago

It was 48GB and now I can use 12GB more!

FlishFlashman@alien.top · 10 months ago

≥64GB allows 75% to be used by GPU. ≤32 its ~66%. Not sure about the 36GB machines.

CheatCodesOfLife@alien.top · 10 months ago

64GB M1 Max here. Before running the command, if I tried to load up goliath-120b: (47536.00 / 49152.00) - fails

And after sudo sysctl iogpu.wired_limit_mb=57344 : (47536.00 / 57344.00)

So I guess the default is: 49152

fallingdowndizzyvr@alien.top · 10 months ago

So I guess the default is: 49152

It is. To be more clear, llama.cpp tells you want the recommendedMaxWorkingSetSize is. Which should match that number.

bebopkim1372@alien.top · 10 months ago

Maybe 47536MB is the net model size. For LLM inference, memory for context and optional context cache memory are also needed.

M1/M2/M3: increase VRAM allocation with sudo sysctl iogpu.wired_limit_mb=12345 (i.e. amount in mb to allocate)

M1/M2/M3: increase VRAM allocation with sudo sysctl iogpu.wired_limit_mb=12345 (i.e. amount in mb to allocate)

M1/M2/M3: increase VRAM allocation with `sudo sysctl iogpu.wired_limit_mb=12345` (i.e. amount in mb to allocate)

M1/M2/M3: increase VRAM allocation with `sudo sysctl iogpu.wired_limit_mb=12345` (i.e. amount in mb to allocate)