I know ooba supposedly work for windows, I had it up and running in Ubuntu but windows error corrected the boot record so I can’t access that environment anymore.

But I’m not interested in roleplay chat too much, so I’m fine with and might actually prefer to run it thorugh a python script. (I’d like to get more than one model up and running simultaneously for an “LLM” village NPC interaction experiment. but I digress. )

Looking at HF I see some code snippets, but there’s a variety of libraries and approaches to it? Is there anything considered a “gold standard” as of late for local windows LLMs that is not a pain in the ass to set up and supports the latest quantization flavors? I’ll aim to run on 24GB Vram but I also have 64GB system RAM and the option to run on both would be appreciated, but primarily I’m aiming for GPU.