Messing around with Yi-34B based models (Nous-Capyabara, Dolphin 2.2) lately, I’ve been experiencing repetition in model output, where sections of previous outputs are included in later generations.

This appears to persist with both GGUF and EXL2 quants, and happens regardless of Sampling Parameters or Mirostat Tau settings.

I was wondering if anyone else has experienced similar issues with the latest finetunes, and if they were able to resolve the issue. The models appear to be very promising from Wolfram’s evaluation, so I’m wondering what error I could be making.

Currently using Text Generation Web UI with SillyTavern as a front-end, Mirostat at Tau values between 2~5, or Midnight Enigma with Rep. Penalty at 1.0.

  • out_of_touch@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    I encounter this a lot with the Yi 34B models to the point where I’ve basically stopped using them for chat. I’ve tried a huge variety of settings, presets, quants, etc. I’ve used koboldcpp and text-generation-webui, I’ve used EXL2, GGML, and GPTQ. The issue appears consistently after the context grows past a certain size. Partial or entire messages will repeat. It will also get stuck where regenerating will always result in the same response unless drastic changes to settings are made and usually it just changes the message that it’s stuck on. Smaller changes to the settings will just result it in changing the wording slightly of the stuck message.

  • Dry-Judgment4242@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    I pretty much gave up trying to make Yi based models actually use more then 4k context. And at that point I rather just use Lzlv 70b which is much smarter with better prose and knowledge.

    The repetition issue pretty much makes the models unusable past the context where it breaks.

    • HvskyAI@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Agreed - I’m personally using 70B models at 2.4BPW EXL2 quants, as well. They hold up great even at a small quantization as long as sampling parameters are set correctly, and the models are subjectively more pleasant in prose (Euryale 1.3 and LZLV both come to mind).

      At 2.4BPW, they fit into 24GB of VRAM and inference is extremely fast, and EXL2 also appears to be very promising as a quantization method. I believe the potential upsides are yet to be fully leveraged.

  • Ravenpest@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    No issues here, just a lot of confidence on certain tokens but overall very little repetition. I use Koboldcpp, Q5 K M. Dont abuse temp, the model seems to be exceedingly sensitive and the smallest imbalance breaks its flow. Try temp 0,9, rep pen 1.11, top k 0, min-p 0.1, typical 1, tfs 1.

    • estacks@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      I’ll have to try these settings, I have OPs problems too and I always have to crank the temperature up to get it to work. Then it gets schizophrenia a few messages later. Thanks!

      • Ravenpest@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        High temp does more harm than good. I would suggest looking into what the other settings do before raising it, no matter the model

    • HvskyAI@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      I see, the model does tend to run a bit hot as-is. I’ll go ahead and try these settings out tomorrow.

  • a_beautiful_rhind@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    On EXL2, when it started doing that, I cranked the temp to 2.0 rather than using dynamic temperature. That made it go away. Going to try higher rep pen next and see what happens. I’m at 8k context and it’s doing it.

  • uti24@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    I had a high hopes for Yi-34B chat, but when I tried it I saw it is not very good.

    70B models are better (well of course), but I think even some 20B models are better.

    • HvskyAI@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      I am having better luck with 2.4BPW EXL2 quants of 70B models from Lone_Striker lately - Euryale 1.3, LZLV, etc.

      Even at the smaller quants, they are quite strong at the correct settings. Easily comparable to a 34B at Q4_K_M, from my experience.