• matsu-morak@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      I could not undestand it. Is this true audio (can differentiate a helicopter sound from a fire engine for example, or a dog bark) or it just transforms speech into text and then it feeds the model?

      • omniron@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        It’s the former. It’s looking at audio data

        So you can ask it sentiment, determine if someone is giggling, crying, laughing, can maybe even detect a condescending tone or flirtatious tone etc.

      • kxtclcy@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        Maybe for audio data that have both sound and words? For example if you want to summarize a concert or sth