Cradawx@alien.topB to LocalLLaMA@poweruser.forumEnglish · 10 months agoShareGPT4V - New multi-modal model, improves on LLaVAsharegpt4v.github.ioexternal-linkmessage-square17fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkShareGPT4V - New multi-modal model, improves on LLaVAsharegpt4v.github.ioCradawx@alien.topB to LocalLLaMA@poweruser.forumEnglish · 10 months agomessage-square17fedilink
minus-squareM0ULINIER@alien.topBlinkfedilinkEnglisharrow-up1·10 months agohttps://preview.redd.it/vnony8f0ax1c1.png?width=1080&format=pjpg&auto=webp&s=dc261252751a0a1e209d9049854895688de25fa4 Benchmark in their GitHub, even if it’s hard to be sure in current times
minus-squarelakolda@alien.topBlinkfedilinkEnglisharrow-up1·10 months agoThis isn’t comparing with the 13B version of LLAVA. I’d be curious to see that.
minus-squarejustletmefuckinggo@alien.topBlinkfedilinkEnglisharrow-up1·10 months agoim new here. but is this true multimodality, or is it the llm communicating with a vision model? and what are those 4 models being benchmark tested here for exactly?
https://preview.redd.it/vnony8f0ax1c1.png?width=1080&format=pjpg&auto=webp&s=dc261252751a0a1e209d9049854895688de25fa4
Benchmark in their GitHub, even if it’s hard to be sure in current times
This isn’t comparing with the 13B version of LLAVA. I’d be curious to see that.
im new here. but is this true multimodality, or is it the llm communicating with a vision model?
and what are those 4 models being benchmark tested here for exactly?