Basically, that’s my question. The caveat is that I would like to avoid a Mac Mini and I wonder if some of Minisforum’s mini PCs can handle LLM.
I don’t know what specs mini PCs tend to have, but I can run OpenHermes 2.5 on my laptop with an Intel Core i5-8365U CPU and 24 gigs of ram. Even without a graphics card, I get responses in less than a minute with a bunch of memory tokens.
The caveat is that I would like to avoid a Mac Mini
If a little machine is your goal, then a Mac is the best way to go. Not a Mac Mini though. Memory bandwidth is too low. A Mac Studio is your best bet for a mini machine that can LLM.
The issue is a lot of them have either Intel CPUs with the on board graphics, or AMDs CPUs… With on board graphics. Mini PCs with Nvidia GPUs are uncommon.
Zotac did some small PCs with Nvidia GPUs I think but I doubt any of them have much vram.
If you pair a mini pc with thunderbolt and connect it to a eGPU, that could be a setup that would work…
0.3 tokens per second is not “handling”.
How about a Zotac zbox? ZBOX QRP7N3500
Max supported memory is 64GB and it has an RTX3050 with 12GB VRAM. I expect that it could run 7b models easily.
Access to powerful, open-source LLMs has also inspired a community devoted to refining the accuracy of these models, as well as reducing the computation required to run them. This vibrant community is active on the Hugging Face Open LLM Leaderboard, which is updated often with the latest top-performing models.
That’s a nice indirect shout out.
How mini do you want? I plugged a llama2 7b into an N100 w/16gb and ran it, speed was not very good.
Real question is what are you trying to accomplish and is this the best route to do so?