I am talking about this particular model:
https://huggingface.co/TheBloke/goliath-120b-GGUF
I specifically use: goliath-120b.Q4_K_M.gguf
I can run it on runpod.io on this A100 instance with “humane” speed, but it is way too slow for creating long form text.
These are my settings in text-generation-webui:
Any advice? Thanks
Thanks. Will try this. No idea how these really work so that is why i am asking :)