I use in both cases q4_K_M

  • phree_radical@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Most of the benchmarks seem to measure regurgitation of factual knowledge, which IMO everyone should accept as a misguided idea for a task, from in-weights learning, instead of testing in-context learning, which I would argue was the goal of LLM training. I’d say they are probably harmful to the cause of improving future LLMs

    • andrewlapp@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I agree, and The Leaderboard’s newly added DROP metric is a step in the right direction.