• a_beautiful_rhind@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Will be interesting to compare it to spicyboros and 70b dolphin. Spicy already “fixed” yi for me. I think we finally got the middle model meta didn’t release.

    • FullOf_Bad_Ideas@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      What prompt format do you use? I was trying to figure out it’s inherent prompt format but it didn’t go well. I reasoned that if I enter “<>” , it will reveal it’s most likely system message, but it generates some bash-like code most of the time. It was trained for 1 epoch (should be about 80k samples) with constant 0.0001 learning rate but the prompt format isn’t as burned-in as my qlora (2 epochs on 5k samples) with constant 0.00015 lr, I don’t get why.

      • a_beautiful_rhind@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        For spicy I use the same one as airoboros 3.1, which I think is llama 2 chat. Have alpaca set in the telegram bot and nothing bad happened.

        On larger better models the prompt format isn’t really that serious. If you see it giving you code or extra stuff, you try another one till it does what it’s supposed to.

        • FullOf_Bad_Ideas@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          I am getting nice results in webui using exllama 2 loader and llama 2 prompt. Problem is that webui gives me 21 t/s while when using chat.py from exllama directly I get 28.5 t/s. The difference is too big to make me use webui. I tried matching sampler settings, bos, system prompt and repetition penalty but it still has issues there - it either mixes up the prompt, for example outputting <>, prints out a whole-ass comment section to a story, outputs 30 links to YT out of nowhere and generally still acts a bit like a base model. I can’t really blame exllama v2, because my lora works more predictably. I also can’t blame spicyboros, because it works great in webui. It looks the same with raw, llama and chatml prompt formats. It’s not a big deal since it’s still usable, but it bugs me a bit.

  • ambient_temp_xeno@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I’ve been trying out the ggufs I found today and it seems close enough to dolphin 70b at half the size.

    It pointed out that the ‘each brother’ part of the sally test could be taken to imply that they’re different sisters for each brother, and when you change the question to say ‘the brothers share the same 2 sisters’ it gets it right, which is whatever, but it was interesting that it picked up that the test is ambiguous.

  • WolframRavenwolf@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I took a short break from my 70B tests (still working on that!) and tried TheBloke/dolphin-2_2-yi-34b-GGUF Q4_0. It instantly claimed 4th place on my list.

    A 34B taking 4th place among the 13 best 70Bs! A 34B model that beats 9 70Bs (including dolphin-2.2-70B, Samantha-1.11-70B, StellarBright, Airoboros-L2-70B-3.1.2 and many others). A 34B with 16K native context!

    Yeah, I’m just a little excited. I see a lot of potential with the Yi series of models and proper finetunes like Eric’s.

    Haven’t done the RP tests yet, so back to testing. Will report back once I’m done with the current batch (70Bs take so damn long, and 120B even more so).

  • 1dayHappy_1daySad@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I’ve played with it for a bit and I agree with most people here. It seems to be as smart as a 70b, which is a big deal IMO.

  • Sabin_Stargem@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Having tried out Yi-34b-200k with Nous Capybera, I think the Yi-34b-16k Dolphin v2.2 has a better flavor to it. Nous also wants more rep penalty, I am guessing the 200k foundation is doing that. 1.1 is what I used to get a better response. Haven’t tried a higher penalty yet.

    • mcmoose1900@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      The Yi 200K base model was really funny about sampling. Mirostat was a disaster, and so were some other presets, but it liked TFS.

    • Amgadoz@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Is Goliath that good? Is it that better than all of the Llama2-70B tunes that’s worth the hardware investments needed for running it?

  • ViennaFox@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Anyone have a good SillyTavern preset for this model? I haven’t been able to nail one down.