Have you tried kobold horde?
I’m probably going to ask something extremely basic, but why GPTQ isn’t an option? With OP’s double GPU he can run 4bit 32g with 8k context, and I was under impression that the quality loss is barely noticeable. Though I noticed it absolutely messes up numbers (math, or historical dates).
I’m sorry for a little side-track, but how much context you able to squeeze into your 3 GPUs with Goliath’s 4bit quant?
I’m considering to add another 3090 to my own doble-GPU setup just to run this model.
Holy… 4x3090! No wonder it was hard to find my third one for reasonable price.