This is purely out of curiosity, but if anybody has some insights I’d love to hear it.
I am running 70B Q4 models on my M1 Max Macbook Pro (10 CPU, 32 GPU, 64 GB RAM). The lid is closed because I have an external monitor 4K attached via USB-C, so the display won’t draw any power.
I am using both llama.cpp and LM Studio, and in both cases I run the LLMs with Metal acceleration.
Now, when running the LLM, I notice that according to iStat Menus my macbook is drawing between 95 and 110W 😮
(The fans get loud quickly, just like the good old intel days. But it seems to be able to sustain this)
But how is that possible?
Where is that power draw coming from? The GPU alone is max 45W, and the CPU is something around ~30W max (I forgot the exact value), but it’s not even used much. In the screenshot it pulls a meager ~12W. So That’s a total of ~57W for CPU+GPU combined. Where do the other 50W+ go?
Where is the additional power draw coming from? I know there are lots of other components here: RAM (probably single digit power draw?), fans, memory controller, etc etc. But we are talking about a large chunk of power.
Does anybody know? :)
I think part of the answer is that RAM uses more power than you think when it’s running near full-tilt, like it is during generation. Micron’s advice is to figure 3w per 8GB for DDR4, and more than that for the highest performance parts. The fact that the RAM is on package probably offsets that somewhat, but that’s still more than single digits.
Power consumption on my 24Core GPU M1 Max is similar to yours, though somewhat lower as you’d expect, according to both iStat Menus and Stats.app.
There is also the question of how accurate they are.
Oh wow, that would be much more than I expected. Could make a lot of sense as the LLM is probably hitting the RAM very hard. Thank you 🙌