I keep diving and finding GPT-4V prototypes shared on X: e.g. narration for videos (source), posture correction (source), etc.
As foundation models in computer vision become even more accessible, will the field recover some attention (wrt to LLMs hype)?
You must log in or register to comment.
Maybe? Vision has been around a lot longer than NLP in industry. It’s permeated into some challenging areas like embedded and edge spaces due to privacy and requirements. If the foundation models can’t run on the edge then I can imagine foundation models only affecting a small portion of vision applications.