I’ve used most of the high-end models in an unquantized format at some point or another (Xwin, Euryale, etc.) and found them generally pretty good experiences, but always seem to lack the ability to “show, not tell” in a way that a strong writer knows how to do, even when prompted to do so. At the same time, I’ve always been rather dissatisfied with a lot of quantizations, as I’ve found the degradation in quality to be rather noticeable. So up until now, I’ve been running unquantized models in 2x a100s and extending the context as far as I’m able to get away with.

Tried Goliath-120b the other day, and this absolutely stood everything on its head. Not only is it capable of stunning levels of writing and implying far more than directly stating in a way I’ve not sure I’ve seen in a model to date, but the exl quants from panchovix to get it to run in a single A100 at 9-10k extended context (about where RoPE scaling seems to universally start to break down in my experience). Best part is, if there is a quality drop (I’m using 4.85 bpw) I’m not seeing it - at all. So not only is it giving a better experience than an unquantized 70b model, but it’s doing so at about half the cost of my usual way of running these models.

Benchmarks be damned, for those willing to rent an A100 for their writing, however this model was managed I think this might be the actual way to challenge the big closed source/censored LLMs for roleplay.

  • Sabin_Stargem@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I tend to use models with at least 16k context. Goliath 120b q2 was coherent, but was also very much out of character when telling the NSFW bust massage story. “Yeahyeah” and other lingo. Probably quite good at a lower context, but 16k definitely isn’t the proper fit for Goliath.

    The search for the Goldilocks Model continues.

      • Sabin_Stargem@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        I don’t think any small models are actually good for that usecase, at least not for serious writing. The best we got access to are probably Mistral finetunes (up to 32k), and Yi-34b, but Yi doesn’t have any finetunes yet. An Dolphin should on the way for Yi, IIRC.

        In any case, my favorite 7b model tend to be franken merges, which stitch together an assortment. This allows the resulting model to be able to grasp a wider range of topics. At the moment, the best for this size is likely Undi’s Toppy, which is uncensored is well rounded.

        The issue with Mistral 7b and small models is that they tend to lose flavor over time, and the logic also gets weaker. Coherent, but the ‘X’ factor is gone.