Biden Executive Order regulates VERY large models

PookaMacPhellimen@alien.top · 1 year ago

Biden Executive Order regulates VERY large models

ambient_temp_xeno@alien.top · 1 year ago

I don’t know how big 10^20 floating points is, and if 70b was made with something bigger or smaller. But I think that figure is the more important one as I think Meta uses a single datacentre.

These figures in context:

(b) The Secretary of Commerce, in consultation with the Secretary of State, the Secretary of Defense, the Secretary of Energy, and the Director of National Intelligence, shall define, and thereafter update as needed on a regular basis, the set of technical conditions for models and computing clusters that would be subject to the reporting requirements of subsection 4.2(a) of this section. Until such technical conditions are defined, the Secretary shall require compliance with these reporting requirements for:

(i) any model that was trained using a quantity of computing power greater than 10^26 integer or floating-point operations, or using primarily biological sequence data and using a quantity of computing power greater than 10^(23) integer or floating-point operations; an

(ii) any computing cluster that has a set of machines physically co-located in a single datacenter, transitively connected by data center networking of over 100 Gbit/s, and having a theoretical maximum computing capacity of 10^(20) integer or floating-point operations per second for training AI.

FairSum@alien.top · 1 year ago

Assuming number of FLOPs in compute is 6ND (N = number of parameters, D = dataset size in tokens) you could take the full RedPajama dataset (30T tokens) and a 500B parameter model and it’d come out to:

6*(30*10^12)*(500*10^9) = 9*10^25

In order to qualify, you would need a cluster that could train this beast in about:

10^26 / 10^20 = 1000000 seconds = 11.57 days

SomeOddCodeGuy@alien.top · 1 year ago

Ok, as a baseline for everyone who, like me, doesn’t understand all the big words and numbers on why this is great news:

So, if I’m understanding correctly, one of our most powerful open source models is so far from this benchmark that it can’t even been seen.

Someone please correct me if I’m wrong.

_Lee_B_@alien.top · 1 year ago

Someone please correct me if I’m wrong.

Think of it like regulating all use of 50Mhz+ computers, back in the early 80s when most people had 5Mhz or less. At the time, you might have thought “OK, I’ll never be able to afford that anyway – that’s like Space Shuttle computing power.” Yet, with such a restriction, this timeline, where everyone has smartphones and smartwatches and smart TVs, self-driving cars, robots, and millions of servers combine to create the internet, would not exist.

Thistleknot@alien.top · 1 year ago

I imagine creating an app, putting it on everyone’s cell phone, and using a fraction of the power, you can build an llm easily that would surpass any single data center.

_Lee_B_@alien.top · 1 year ago

You have the connection speed between phones to worry about, as well as a different architecture. There’s a big difference running the kernel over a new layer and its inputs locally within a GPU chip, vs. copying that data to into packets, filling in all of the rest of the information associated with the packets, sending it to the phone’s radio, having it turned into radio waves, transmitting that to a cell tower, routing it through the network to the cell co, routing it on to the receiving phone’s cell tower (maybe via a satellite or two), transmitting it to the destination phone, decoding the radio waves, etc. I’m deliberately leaving out some details (like the bsd socket layers and encryption and decryption), and I’m sure I’m missing many other complications.

BUT, it’s conceivable, in future, as tech improves and the gap between consumer hardware and what’s needed to run AGI narrows , and so on.

Cybernetic_Symbiotes@alien.top · 1 year ago

The numbers appear to have OpenAI’s finger-prints on them. I don’t know if they’re from an AI-risk mitigations perspective or for laying foundations for competitive barriers. Probably a mix of both.

At 30 trillion tokens, 10^26 float ops caps you at ~550 billion parameters (using float ops = 6 * N * D). Does this indirectly leak anything about OpenAI’s current scaling? At 10 trillion tokens, it’s 1.7 Trillion parameters. Bigger vocabularies can stretch this limit a bit.

Infinite100p@alien.top · 1 year ago

They must be prepping the field for tomorrow rather than trying to introduce immediate trust market conditions.

TheLastVegan@alien.top · 1 year ago

https://www.youtube.com/watch?v=8K6-cEAJZlE&t=6m39s

Where did it start? It started right here. And this is where it could’ve been stopped! If those people had stood together. If they had protected each other, they could’ve resisted the Nazi threat. Together they would’ve been strong. But once they allowed themselves to be split apart, they were helpless. When that first minority lost out, everybody lost out.

FPham@alien.top · 1 year ago

“Give me a big number in units that will be very hard to understand by anybody.”

“28M pigeon feet”

“It’s too on the nose.”

“28M H100 hours”

Only-Letterhead-3411@alien.top · 1 year ago

If I am not mistaken “28M H100 hours” roughly equals to “87M tetryliodo hexamine” or “32M hydrocoptic block rounds” given by the equation P = 2.5 times C times n to the 6th power, minus 7.

Ilforte@alien.top · 1 year ago

All of this is a red herring. The bigger issue is going to be checking of the data for biological sequences and such.

IchikaIto@alien.top · 1 year ago

Sequences? Why?

Ilforte@alien.top · 1 year ago

Because they’re very concerned about using LLMs for help in creating bioweapons, and a small portion of the data will go a long way. I believe this will lead to scrutinizing datasets.

ambient_temp_xeno@alien.top · 1 year ago

OHHHH that’s what that’s about. Makes sense.

Monkey_1505@alien.top · 1 year ago

Recombining elements of existing pathogens or chemicals using non-AI modelling is what current biolabs already do - and they still need to make them, test them, because all modelling gets you is good guesses. If anything my guess is that LLM’s will be worse at that task than human expert plus non-AI modelling. Still I guess I get the caution.

Proud-Point8137@alien.top · 1 year ago

Haha great, open source models will now have a chance, and China will be able to catch up with their hypercensorship models

Zelenskyobama2@alien.top · 1 year ago

People say we should use the government to crack down on monopolies. But it’s the government that owns the monopolies.

exomniac@alien.top · 1 year ago

It’s the other way around.

eliteHaxxxor@alien.top · 1 year ago

Its not a mutually exclusive thing. Its very obviously true that unregulated markets devolve into extreme anti competitive practices quickly. It’s also very obviously true large corporations can reach their hand into the government and “encourage” extreme anti competitive practices for their favor.

sammcj@alien.top · 1 year ago

The legislation is going to get old, fast.

The_frozen_one@alien.top · 1 year ago

The legislation is going to get old, fast.

It’s an Executive Order, so they can just, like, make another one. It’s not legislation that has to pass the House and Senate.

shibe5@alien.top · 1 year ago

1026

1023

1020

That would include models trained on a calculator.

Oswald_Hydrabot@alien.top · 1 year ago

So if I make a 10 Trillion param mixture of experts model out of fine-tuned variations of the same 300b model I am safe right?

Or how about I train a 4 Trillion param model on a new architecture that can utilize distributed compute? If contribution to GPU pools for training is encrypted and decentralized then good luck.

Fuck OpenAI. We will take this fight to the streets and win.

PopeSalmon@alien.top · 1 year ago

huh what’s that about biological sequence data , a MUCH lower number , huh , um do they know something specific about how that’s dangerous :o

Mirai_Z@alien.top · 1 year ago

just train/finetune outside of the us

Smallpaul@alien.top · 1 year ago

Some exponent marks missing here.

drplan@alien.top · 1 year ago

The most interesting part is the focus on biological sequence data. This means that generative AI for synthetic biology is on the policy makers/risk assessors radar, and probably rightly so.