• noeda@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I’ve seen the “… beats GPT-4” enough times that now whenever I see a title that suggests a tiny model can compete with GPT-4 I see it as a negative signal; that the authors are bullshitting through some benchmarks or some other shenanigans.

    It’s annoying because the models might be legitimately good models for being open and within their weight class but now you’ve put my brain in BS detecting mode and I can’t trust you’ve done good faith measurement anymore.

    • Evening_Ad6637@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Yeah I dont think authors are intentionally bullshitting or intentionally doing “benchmark cosmetics”, but maybe it’s more lack of knowledge on whats going on in terms of (most of) benchmarks and their the image that has become ruined in the meantime.

      • Competitive_Ad_5515@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Sure, but name-dropping the biggest name in the game and comparing yourself favourably to it is a big swing. It’s either a naive at best marketing claim or it’s untrue.

    • bot-333@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      There are SO many models “bullshitting through some benchmarks or some other shenanigans” that I’m cooking my own benchmark system LOL.