[R] ConvNets Match Vision Transformers at Scale

psyyduck@alien.top · 1 year ago

[R] ConvNets Match Vision Transformers at Scale

linearmodality@alien.top · 1 year ago

Wasn’t this already known? I thought the ConvNeXt paper already showed this a year and a half ago.

RobbinDeBank@alien.top · 1 year ago

This group might have too much TPU credits and don’t know what to with it.

qalis@alien.top · 1 year ago

Yes and no. In my opinion, ConvNeXt is less about data and more about careful architecture design and smart training, and less about data. But yeah, CNNs are better than ViTs if done well, that’s true.

That_Flamingo_4114@alien.top · 1 year ago

Not necessarily, a maxxed out perfect conditions system could match the newest developing technology. The papers whole point was that of how you use a technique can matter as much as the algorithm itself. Another paper stating this occurred in the world of recommender systems by Google