minus-squarehoppyJonas@alien.topBtoMachine Learning@academy.garden•[R] ConvNets Match Vision Transformers at ScalelinkfedilinkEnglisharrow-up1·1 year agoIt’s probably both. In the Chinchilla paper, they showed that for compute-optimal training, the model size and the training dataset size should be proportional. linkfedilink
It’s probably both. In the Chinchilla paper, they showed that for compute-optimal training, the model size and the training dataset size should be proportional.