If you’re using Metal to run your llms, you may have noticed the amount of VRAM available is around 60%-70% of the total RAM - despite Apple’s unique architecture for sharing the same high-speed RAM between CPU and GPU.

It turns out this VRAM allocation can be controlled at runtime using sudo sysctl iogpu.wired_limit_mb=12345

See here: https://github.com/ggerganov/llama.cpp/discussions/2182#discussioncomment-7698315

Previously, it was believed this could only be done with a kernel patch - and that required disabling a macos security feature … And tbh that wasn’t that great.

Will this make your system less stable? Probably. The OS will need some RAM - and if you allocate 100% to VRAM, I predict you’ll encounter a hard lockup, spinning Beachball, or just a system reset. So be careful to not get carried away. Even so, many will be able to get a few more gigs this way, enabling a slightly larger quant, longer context, or maybe even the next level up in parameter size. Enjoy!

  • Jelegend@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    I am getting the following error on running this command on Mac Studio M2 Max 64GB RAM

    sysctl: unknown oid ‘iogpu.wired_limit_mb’

    Can soeome help me out here on what to do here?

    • bebopkim1372@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Do you use macOS Sonoma? Mine is Sonoma 14.1.1 - Darwin Kernel Version 23.1.0: Mon Oct 9 21:27:24 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6000 arm64.