Basically - "any model trained with ~28M H100 hours, which is around $50M USD or - any cluster with 10^20 FLOPs, which is around 50,000 H100s, which only two companies currently have " - hat-tip to nearcyan on Twitter for this calculation.
Specific language below.
" (i) any model that was trained using a quantity of computing power greater than 1026 integer or floating-point operations, or using primarily biological sequence data and using a quantity of computing power greater than 1023 integer or floating-point operations; and
(ii) any computing cluster that has a set of machines physically co-located in a single datacenter, transitively connected by data center networking of over 100 Gbit/s, and having a theoretical maximum computing capacity of 1020 integer or floating-point operations per second for training AI."
Assuming number of FLOPs in compute is 6ND (N = number of parameters, D = dataset size in tokens) you could take the full RedPajama dataset (30T tokens) and a 500B parameter model and it’d come out to:
6*(30*10^12)*(500*10^9) = 9*10^25
In order to qualify, you would need a cluster that could train this beast in about:
10^26 / 10^20 = 1000000 seconds = 11.57 days