• 0 Posts
  • 5 Comments
Joined 11 months ago
cake
Cake day: November 27th, 2023

help-circle

  • You convinced me to finally try the goliath! I don’t have the ability to run it locally so I rented a cloud GPU just for this. With 80GB VRAM, it fits the largest EXL2 quant currently available.

    Verdict: It’s absolutely incredible compared to the small models!!! Finally, I’m not constantly swiping responses after it produced nonsense, instead it generates great responses on the first try every time, doesn’t talk as me, and doesn’t constantly confuse characters and logic!

    But, the tiny 4096 context is very limiting. I hit it very quickly with my conversation. Tried scaling up the context size in the parameters, but this made it perform noticeably worse… no longer generating multiple paragraphs, dropping formatting symbols, goes on.

    Is that the expected result? There’s no magic way to run these models with huge contexts yet, right?


  • Hmm, I didn’t notice a major quality loss when I swapped from mistral-7b-openorca.Q8_0.gguf (running in koboldcpp) to Mistral-7B-OpenOrca-8.0bpw-h6-exl2 (running in text-gen-webui). Maybe I should try again. Sure you were using comparable sampling settings for both? I noticed for example SillyTavern has entirely different presets per backend.

    Still need to try the new NeuralChat myself also, I was just going to go for the exl2, so this could be a good tip!