Say you are a potato chips company. The goal is to have consumers upload images of the product they are having issues with and be able to identify the product by brand/variant using machine learning. Consumers can upload real product photos that they have taken, or upload bogus images from the internet, or even upload completely irrelevant/inappropriate photos (like that of a dog or cat).

real image

web image 1

web image 2

bogus image

In this example, for the legitimate image, the goal is to classify it as “Lays Classic”. There might be products that are not in bag form, such as those in tubes. Furthermore, the images taken can be in different lighting conditions/orientations. Some images might have other products as well.

I have been out of the ML field for the past 4 years so I’m not up to date on the most state of the art methods for this problem. I have studied CNNs 4 years ago, but there has been advances like transformer based methods. Someone has tried ResNet-50 and YOLOv5, and I’m thinking about using a pretrained model like CLIP and just train the final classification layer.

But I would appreciate to hear from someone more well versed what recommended approach to take as far as model/labeling/number of images needed per class, etc. It might be that I would need multiple models, such as one to identify the legitimate images from the rest, and then another one to identify the product/variants.

Any advice would be welcome. Thanks

  • colefinbar1@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago
    • Lionvaplus could be a good option to generate additional training data through its photorealistic image creation. This could help improve model accuracy, especially for classes with limited real-world image samples.

    • For basic classification between legitimate product images and bogus/irrelevant images, Lionvaplus may not be necessary. A pretrained model like ResNet or efficientnet fine-tuned on your data should work decently.