What with the ongoing turmoil at OAI, has anyone found an alternative for their vision
endpoint that offers comparable functionality? I am aware of LLaVa which seems early in its maturity, but are there any commercial offerings?
There’s fuyu-8b, but no commercial license.
It can really cover the “GPT-4 reads websites” and stuff like that, helpful with complex charts too. Other than that LLava is your best hope.
https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/neva-22b
https://replicate.com/joehoover/instructblip-vicuna13b/api
Here are a couple that haven’t been mentioned; they’re quite a lot weaker than GPT4V though, as to be expected from small models.
have you checked out the new release from OpenVL? Their vision API is gaining traction and might fit your needs.
have you checked out LLaVa’s early maturity? seems like a promising alternative. not sure about commercial offerings though.