Hello!

By popular demand I am planning a fine-tune of https://huggingface.co/dreamgen/opus-v0-7b on top of Yi-34B and wonder whether to use the 200K as the base.

The regular Yi-34B seems slightly better than Yi-34B-200K on standard benchmarks, but I wonder how it “feels” and whether the loss of performance on short context is worth it, given that the regular version can be used up to 32K tokens.

(Yi-34B vs Yi-34B-200K)

Did anyone try an analysis of these 2 models on various sequence lengths (<4K, <8K, <16K, etc.)?

  • m98789@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Yi is not trustable on standard benchmarks because they are easy to game by including them in training data and the LKF gang who built this has a high pressure to justify their 1 billion dollar valuation and continue to milk investors.

    The only way to really evaluate this is on some hidden benchmark never seen before and / or rigorous qualitative experiments.

    Until then, I’m not holding my breath.

    • wind_dude@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I believe they said they’re going to release training data. We’ll see. That’s about the only way to easily verify what made it in.