Basically - "any model trained with ~28M H100 hours, which is around $50M USD or - any cluster with 10^20 FLOPs, which is around 50,000 H100s, which only two companies currently have " - hat-tip to nearcyan on Twitter for this calculation.
Specific language below.
" (i) any model that was trained using a quantity of computing power greater than 1026 integer or floating-point operations, or using primarily biological sequence data and using a quantity of computing power greater than 1023 integer or floating-point operations; and
(ii) any computing cluster that has a set of machines physically co-located in a single datacenter, transitively connected by data center networking of over 100 Gbit/s, and having a theoretical maximum computing capacity of 1020 integer or floating-point operations per second for training AI."
“Give me a big number in units that will be very hard to understand by anybody.”
“28M pigeon feet”
“It’s too on the nose.”
“28M H100 hours”
If I am not mistaken “28M H100 hours” roughly equals to “87M tetryliodo hexamine” or “32M hydrocoptic block rounds” given by the equation P = 2.5 times C times n to the 6th power, minus 7.