You probably need to wait for the Mac Studio refresh announcements for something more clearly relevant to LLM devs. Hopefully those will have 256GB or more unified memory configs, but likely something for 2024.
That said, it’s handy to be able to run inference on a q8 70b model on your local dev box, so the 96GB & 128GBs are interesting for that.
I don’t think you should be surprised that a 34B model is mostly failing, considering the fact that a 200B model (GPT-3.5) is only getting to 40%. What you’re asking the LLM to do is very hard for it without further training/tuning.