Previously when I was more VRAM limited - koboldcpp.
Now, I mainly use modified cli exllamav2 chat.py and oobabooga 50/50.
Chat.py is about 8 token/s / 45% faster then oobabooga with the same model and exllamav2 loader for some reason, and I like having fast generation more than having nice UI. You forgot to mention SillyTavern, I think it gets a lot of use among coomers.
Previously when I was more VRAM limited - koboldcpp. Now, I mainly use modified cli exllamav2 chat.py and oobabooga 50/50. Chat.py is about 8 token/s / 45% faster then oobabooga with the same model and exllamav2 loader for some reason, and I like having fast generation more than having nice UI. You forgot to mention SillyTavern, I think it gets a lot of use among coomers.