I’m new to image classification and ML and this is going to be my first project on those topics. I’m considering using VGG16 because I saw some studies showing that it has a generally great accuracy score (80-95%) but I’m worried that the model might not be fast enough or the app file size might get massive if I want the app to be usable without internet connection.

What do you guys think?

  • ImaSakon@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    VGG16 is outdated. CNNs based on architectures like EfficientNet and MobileNetV3 have superior accuracy. If attention mechanisms are acceptable in the network, Vision Transformers such as MobileViT are excellent.

  • shubham0204_dev@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    You may use MobileNet models as they use separable convolutions, which have lesser parameters and execution time than simple/regular convolutions. Moreover, MobileNets are easy to train and setup (tf.keras.applications.* has a pre-trained model) and can be used as a backbone model for fine-tuning on datasets other than the ImageNet.

    Further, you can also explore quantization and weight pruning. These are some techniques that can be used to optimize models to have a smaller memory footprint and smaller execution time on embedded devices.