Why is no one releasing 70b models?

Longjumping-Bake-557@alien.top · 1 year ago

Why is no one releasing 70b models?

__JockY__@alien.top · 1 year ago

It took 3,311,616 hours of training for the llama2 70b base model. At $1/hour for an A100 GPU you’d spend just over $3M and it would take approximately 380 years to train the model.

Scale that across 10,000 GPUs and you’re looking at 2 weeks and a couple of million dollars.

Fine tune training is much, much faster and cheaper.

ninjasaid13@alien.top · 1 year ago

How much would that be in H100s or H200s?

__JockY__@alien.top · 1 year ago

A bushel.

MerePotato@alien.top · 1 year ago

About tree fiddy

Exotic-Estimate8355@alien.top · 1 year ago

$1/hour for an A100 ? Where? I can barely get one in GCE and it’s almost 4$ / hr

toothpastespiders@alien.top · 1 year ago

I’d like to know too if there’s one for exactly $1. Even half a buck or so difference builds up over time.

But runpod’s close at least, at $1.69/hour.

__JockY__@alien.top · 1 year ago

Yes, but you don’t have Meta’s purchasing power to rent 10,000 GPUs for a month. Economies of scale, my friend!

__JockY__@alien.top · 1 year ago

I’ll reply to myself!

It’s not just about GPU expense. You need a small team of ML data scientists. You need access to (or a way to scrape/generate) a mind-bogglingly broad dataset. You need to clean, normalize, and prepare the dataset. All of this takes a huge amount of expertise, time and money. I wouldn’t be at all surprised if the auxiliary costs surpassed the GPU rental cost.

So the main answer to your question “Why is no one releasing 70b models?” is: it’s really, really, really expensive. Other parts of the answer are: lack of expertise, difficulty of generating a good dataset, and probably a hundred things I haven’t thought of.

But mainly it just comes down to cost. I bet you wouldn’t see any change from $5,000,000 if you wanted to make your own new 70b base model.