Hello,

I’m looking for an alternative to Google Vision AI (LABEL_DETECTION, OBJECT_LOCALIZATION) and Amazon Rekognition (DetectLabels).
Any ideas?

Thanks!

  • hurrytewer@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    CogVLM is supposed to support this with prompts like “Can you provide a description of the image and include the coordinates [[x0,y0,x1,y1]] for each mentioned object?”

    However I couldn’t get it to work properly, it would just hallucinate.

    If you want to give it a shot here are the official visual QA prompts