I was going to say vision transformers still have the advantage as they are often pre-trained on unlabelled images. But now I think of it I don’t see any reason why you couldn’t pre-train a convolutional neural network in the same manner. Just seem to read about it more with vision transformers than CNNs
I was going to say vision transformers still have the advantage as they are often pre-trained on unlabelled images. But now I think of it I don’t see any reason why you couldn’t pre-train a convolutional neural network in the same manner. Just seem to read about it more with vision transformers than CNNs