So for background I’ve had some interest in LLMs and other AI for a year or so. I’ve used online LLMs like ChatGPT but haven’t tried running my own due to 10 year old hardware. I’m considering getting a new PC and want to know whether to splash for one that can do high end LLM stuff.
I’ve read up a fair bit but have some questions that hopefully aren’t too stupid.
1.) It looks like VRAM is the biggest hardware limit for model size. What are some good hardware options at different price points? Are there really expensive options that blow consumer stuff out of the water? Is now a good time to buy or is there something worth waiting for?
2.) Open source models seem to be dependent on the trainers giving away their expensively acquired work. Are you anticipating model releases to replace LLAMA2, and when?
3.) Is retraining or fine tuning possible for ordinary users? Is this meaningfully different from having a ‘mission’ or instruction set added to the beginning of each prompt/context?
3.) I think I understand parameter size and compression, but what determines the token context size a model can handle? GPT4s new massive context size is very handy.
4.) I’m interested in ‘AutoGPT’ type systems (or response + validation etc). Can this work in series mode, where you only have 1 model running a time? It seems like having specialised models could be useful. Would loading different models most suited to each particular ‘subroutine’ slow things down a lot? Are these systems difficult to set up or is it just a matter of feeding the output of one query into the input of the next (while adding on previous relevant context).
5.) Is the same type of hardware setup good for both LLMs and Stable Diffusion, or do they have separate setups for good bang/buck?
Many thanks to anyone who can help!
If model size is a priority, the Apple Silicon macs (particularly used or factory refurbished Mac Studio Ultras) provide good value (cost + available memory + performance. Ie 4,679 for 128GB -- 96GB usable by GPU for model + working data). Workstation or multiple high end consumer GPUs can be faster, but also more expensive, more power consumption, bigger case, louder…)
Software options for doing training or fine tuning on Macs using GPU are limited at this point, but will probably improve. This might also be something better done with short term rental of a cloud server.
I haven’t tried Mac and don’t know what the software ecosystem is like. Have you tried it or seen it working?
It looks like it doesn’t have dedicated VRAM, but shared memory. I would guess this is slower than dedicated GPU memory but faster than RAM sticks on a normal PC?
I have a double 3090 setup, so I can run 4bit 70b GPTQ. 70b blows all the smaller models out the water, at least for now. Not interested in any of your questions except the last one. If you use a dual GPU setup like me nothing in SD actually supports multi-GPU, even stuff that sounds like it’ll be amazing with 2 GPUS, (kohya LoRA creation, for example) doesn’t actually work with multiGPU