LLM for object detection/labelling

takezo07@alien.top · 1 year ago

LLM for object detection/labelling

hurrytewer@alien.top · 1 year ago

CogVLM is supposed to support this with prompts like “Can you provide a description of the image and include the coordinates [[x0,y0,x1,y1]] for each mentioned object?”

However I couldn’t get it to work properly, it would just hallucinate.

If you want to give it a shot here are the official visual QA prompts