PookaMacPhellimen@alien.topB to LocalLLaMA@poweruser.forumEnglish · 10 months agoQwen-72B releasedhuggingface.coexternal-linkmessage-square39fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkQwen-72B releasedhuggingface.coPookaMacPhellimen@alien.topB to LocalLLaMA@poweruser.forumEnglish · 10 months agomessage-square39fedilink
minus-squareomniron@alien.topBlinkfedilinkEnglisharrow-up0·10 months agoThere’s an audio multimodal too https://github.com/QwenLM/Qwen-Audio
minus-squarematsu-morak@alien.topBlinkfedilinkEnglisharrow-up1·10 months agoI could not undestand it. Is this true audio (can differentiate a helicopter sound from a fire engine for example, or a dog bark) or it just transforms speech into text and then it feeds the model?
minus-squareomniron@alien.topBlinkfedilinkEnglisharrow-up1·10 months agoIt’s the former. It’s looking at audio data So you can ask it sentiment, determine if someone is giggling, crying, laughing, can maybe even detect a condescending tone or flirtatious tone etc.
minus-squarekxtclcy@alien.topBlinkfedilinkEnglisharrow-up1·10 months agoMaybe for audio data that have both sound and words? For example if you want to summarize a concert or sth
There’s an audio multimodal too
https://github.com/QwenLM/Qwen-Audio
I could not undestand it. Is this true audio (can differentiate a helicopter sound from a fire engine for example, or a dog bark) or it just transforms speech into text and then it feeds the model?
It’s the former. It’s looking at audio data
So you can ask it sentiment, determine if someone is giggling, crying, laughing, can maybe even detect a condescending tone or flirtatious tone etc.
Use cases??
Maybe for audio data that have both sound and words? For example if you want to summarize a concert or sth