Basically - "any model trained with ~28M H100 hours, which is around $50M USD or - any cluster with 10^20 FLOPs, which is around 50,000 H100s, which only two companies currently have " - hat-tip to nearcyan on Twitter for this calculation.
Specific language below.
" (i) any model that was trained using a quantity of computing power greater than 1026 integer or floating-point operations, or using primarily biological sequence data and using a quantity of computing power greater than 1023 integer or floating-point operations; and
(ii) any computing cluster that has a set of machines physically co-located in a single datacenter, transitively connected by data center networking of over 100 Gbit/s, and having a theoretical maximum computing capacity of 1020 integer or floating-point operations per second for training AI."
Because they’re very concerned about using LLMs for help in creating bioweapons, and a small portion of the data will go a long way. I believe this will lead to scrutinizing datasets.
OHHHH that’s what that’s about. Makes sense.
Recombining elements of existing pathogens or chemicals using non-AI modelling is what current biolabs already do - and they still need to make them, test them, because all modelling gets you is good guesses. If anything my guess is that LLM’s will be worse at that task than human expert plus non-AI modelling. Still I guess I get the caution.