There has been a lot of movement around and below the 13b parameter bracket in the last few months but it’s wild to think the best 70b models are still llama2 based. Why is that?

We have 13b models like 8bit bartowski/Orca-2-13b-exl2 approaching or even surpassing the best 70b models now

  • __JockY__@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    It took 3,311,616 hours of training for the llama2 70b base model. At $1/hour for an A100 GPU you’d spend just over $3M and it would take approximately 380 years to train the model.

    Scale that across 10,000 GPUs and you’re looking at 2 weeks and a couple of million dollars.

    Fine tune training is much, much faster and cheaper.

      • toothpastespiders@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        I’d like to know too if there’s one for exactly $1. Even half a buck or so difference builds up over time.

        But runpod’s close at least, at $1.69/hour.

      • __JockY__@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Yes, but you don’t have Meta’s purchasing power to rent 10,000 GPUs for a month. Economies of scale, my friend!

    • __JockY__@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I’ll reply to myself!

      It’s not just about GPU expense. You need a small team of ML data scientists. You need access to (or a way to scrape/generate) a mind-bogglingly broad dataset. You need to clean, normalize, and prepare the dataset. All of this takes a huge amount of expertise, time and money. I wouldn’t be at all surprised if the auxiliary costs surpassed the GPU rental cost.

      So the main answer to your question “Why is no one releasing 70b models?” is: it’s really, really, really expensive. Other parts of the answer are: lack of expertise, difficulty of generating a good dataset, and probably a hundred things I haven’t thought of.

      But mainly it just comes down to cost. I bet you wouldn’t see any change from $5,000,000 if you wanted to make your own new 70b base model.