• pixxelkick@lemmy.world
    link
    fedilink
    arrow-up
    71
    arrow-down
    13
    ·
    11 months ago

    Do people seriously still think this is a thing?

    Literally anyone can run the basic numbers on the bandwidth that would be involved, you have 2 options:

    1. They stream the audio out to their own servers which process is there. The bandwidth involved would be INSTANTLY obvious, as streaming audio out is non-trivial and anyone can pop open their phone to monitor their network usage. You’d hit your data limit in 1-2 days right away

    2. They have the app always on and listening for “wakewords”, which then trigger the recording and only then does it stream audio out. WakewordS plural is doing a LOT of heavy lifting here. Just 1 single wakeword takes a tremendous amount of training and money, and if they wanted the countless amount of them that would be required for what people are claiming? We’re talking a LOT of money. But thats not all, running that sort of program is extremely resource intensive and, once again, you can monitor your phones resource usage, you’d see the app at the top burning through your battery like no tomorrow. Android and iPhone both have notifications to inform you if a specific app is using a lot of battery power and will show you this sort of indicator. You’d once again instantly notice such an app running.

    I think a big part of this misunderstanding comes from the fact that Alexa/Google devices seem so small and trivial for their wakewords.

    What people dont know though is Alexa / Google Home have an entire dedicated board with its own dedicated processor JUST for detecting their ONE wake word, and not only that they explicitly chose a phrase that is easy to listen for

    “Okay Google” and “Hey Alexa” have a non-trivial amount of engineering baked into making sure they are distinct and less likely to get mistaken for other words, and even despite that they have false positives constantly.

    If thats the amount of resources involved for just one wake word/phrase, you have to understand that targeted marking would require hundreds times that, its not viable for your phone to do it 24/7 without also doubling as a hand warmer in your pocket all day long.

    • hperrin@lemmy.world
      link
      fedilink
      arrow-up
      31
      arrow-down
      5
      ·
      edit-2
      11 months ago

      The point of OK Google is to start listening for commands, so it needs to be really good and accurate. Whereas, the point of fluffy blanket is to show you an ad for fluffy blankets, so it can be poorly trained and wildly inaccurate. It wouldn’t take that much money to train a model to listen for some ad keywords and be just accurate enough to get a return on investment.

      (I’m not saying they are monitoring you, just that it would probably be a lot less expensive than you think.)

        • Monument@lemmy.sdf.org
          link
          fedilink
          English
          arrow-up
          4
          ·
          edit-2
          11 months ago

          I think what the person is saying is that if you aren’t listening for keywords to fire up your smart speaker, but are more instead just ‘bugging’ a home, you don’t need much in the way of hardware in the consumers home.

          Assuming you aren’t consuming massive amounts of data to transmit the audio and making a fuss on someone’s home network, this can be done relatively unnoticed, or the traffic can be hidden with other traffic. A sketchy device maker (or, more likely, an app developer) can bug someone’s home or device with sketchy EULA’s and murky device permissions. Then they send the audio to their own servers where they process it, extract keywords, and sell the metadata for ad targeting.

          Advertising companies already misrepresent the efficacy of the ads, while marketers have fully drank the kool-aid - leading to advertisers actually scamming marketers. (There was actually a better article on this, but I couldn’t find it.) I’m not sure accuracy of the speech interpretation would matter to them.
          I would not be surprised to learn that advertisers are doing legally questionable things to sell misrepresented advertising services. … but I also wouldn’t be surprised to learn that an advertising company is misrepresenting their capabilities to commit a little (more) light fraud against marketers.

          sigh yay capitalism. We’re all fucked.

        • pixxelkick@lemmy.world
          link
          fedilink
          arrow-up
          6
          arrow-down
          2
          ·
          11 months ago

          I was about to write this but you took the words right out of my mouth, so I will just write “this ^”

    • Blue_Morpho@lemmy.world
      link
      fedilink
      arrow-up
      14
      ·
      edit-2
      11 months ago

      If it’s random sampled no one would notice. “Oh my battery ran low today.” Tomorrow it’s fine.

      Google used to (probably still does) A/B test Play services that caused battery drain. You never knew if something was wrong or you were the unlucky chosen one out of 1000 that day.

      Bandwidth for voice is tiny. The amr-wb standard is 6.6 kbits/second with voice detection. So it’s only sending 6 kbits/ when it detects voice.

      Given that a single webpage today averages 2 megabytes, an additional 825 bytes of data each second could easily go unnoticed.

      • Rai@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        5
        ·
        11 months ago

        It’s insane people still believe voice takes up heaps of bandwidth.

        Even moreso, on device you could just speech to text, and send the text back home. That’s like… no data. Undetectable.

        Even WITH voice, like you said, fuckin tiny amounts of data for today’s tech.

        This is why I’ll never have “smart” anything in my house.

    • fmstrat@lemmy.nowsci.com
      link
      fedilink
      English
      arrow-up
      16
      arrow-down
      2
      ·
      edit-2
      11 months ago

      This is simply not true. Low bit compressed audio is small amounts of bandwidth you would never notice on home internet. And recognizing wakewords? Tiny, tiny amounts of processing. Google’s design is for accuracy and control, a marketing team cares nothing about that. They’ll use an algorithm that just grabs everything.

      Yes, this would be battery intensive on phones when not plugged in. But triggering on power, via CarPlay, or on smart speakers is trivial.

      I’m still skeptical, but not because of this.

      Edit: For creds: Developer specializing in algorithm creation and have previously rolled my own hardware and branch for MyCroft.

    • Lojcs@lemm.ee
      link
      fedilink
      arrow-up
      7
      ·
      11 months ago

      FYI, sd 855 from 2019 could detect 2 wake words at the same time. With the exponential power increase in npus since then it wouldn’t be shocking if newer ones can detect hundreds

    • Pandemanium@lemm.ee
      link
      fedilink
      arrow-up
      6
      ·
      11 months ago

      But what about a car? Cars are as smart as smartphones now, and you certainly wouldn’t notice the small amount of power needed to collect and transfer data compared to driving the car. Some car manufacturer TOS agreements seemingly admit that they collect and use your in-car conversations (including any passengers, which they claim is your duty to inform them they are being recorded). Almost all the manufacturers are equally bad for privacy and data collection.

      Mozilla details what data each car collects here.

    • Great Blue Heron@lemmy.ca
      link
      fedilink
      arrow-up
      3
      ·
      11 months ago

      What you’re saying makes sense, but I can’t believe nobody has bought up the fact that a lot of our phones are constantly listening for music and displaying the song details on our lock screen. That all happens without the little green microphone active light and minimal battery and bandwidth consumption.

      I know next to nothing about the technology involved, but it doesn’t seem like it’s very far from listening for advertising keywords.

      • liquidparasyte@pawb.social
        link
        fedilink
        English
        arrow-up
        2
        ·
        11 months ago

        That uses a similar approach to the wake word technology, but slightly differently applied.

        I am not a computer or ML scientist but this is the gist of how it was explained to me:

        Your smartphone will have a low-powered chip connect to your microphone when it is not in use/phone is idle to run a local AI model (this is how it works offline) that asks one thing: is this music or is it not music. Anyway, after that model decides it’s music, it wakes up the main CPU which looks up a snippet of that audio against a database of other audio snippets that correspond to popular/likely songs, and then it displays a song match.

        To answer your questions about how it’s different:

        • the song id happens on a system level access, so it doesn’t go through the normal audio permission system, and thus wouldn’t trigger the microphone access notification.

        • because it is using a low-powered detection system rather than always having the microphone on, it can run with much less battery usage.

        • As I understand it, it’s a lot easier to tell if audio seems like it’s music than whether it’s a specific intelligible word that you may or may not be looking for, which you then have to process into language that’s linked to metadata, etc etc.

        • The initial size of the database is somewhat minor, as what is downloaded is a selection of audio patterns that the audio snippet is compared against. This database gets rotated over time, and the song id apps often also allow you to send your audio snippet to the online megadatabases (Apple’s music library/Google’s music library) for better protection, but overall the data transfer isn’t very noticeable. Searching for arbitrary hot words cannot be nearly as optimized as assistant activations or music detection, especially if it’s not built into the system.

        And that’s about it…for now.

        All of this is built on current knowledge of researchers analysing data traffic, OS functions, ML audio detection, mobile computation capabilities, and traditional mobile assistants. It’s possible that this may change radically in the near future, where arbitrary audio detection/collection somehow becomes much cheaper computationally, or generative AI makes it easy to extrapolate conversations from low quality audio snippets, or something else I don’t know yet.

  • impiri@lemm.ee
    link
    fedilink
    English
    arrow-up
    34
    ·
    11 months ago

    They’ve redirected the page now that it’s getting attention, but here’s the archived version.

    I’m very skeptical of their claims, but it’s possible they’ve partnered with some small number of apps so that they can claim that this is technically working.

  • Bear@sh.itjust.works
    link
    fedilink
    arrow-up
    27
    arrow-down
    1
    ·
    11 months ago

    Of course this is possible. Is it practical? Nope. There is already so much data harvested by the likes you Google and Facebook that they can tell what you like, what videos or articles you read, what you share, in some cases who you talk to. Importing a shit ton of audio data is pointless, they already know what you like.

      • DavidGarcia@feddit.nl
        link
        fedilink
        arrow-up
        21
        arrow-down
        3
        ·
        11 months ago

        you just need to process the audio on the devices and then send keywords to Google etc. it’s technically trivial since most phones already have dedicated hardware for that. your phone listens to activation words all the time, unless you disable it. there is no reason why they can’t also forward anything else it hears as text

          • krotti@sh.itjust.works
            link
            fedilink
            arrow-up
            2
            ·
            11 months ago

            I would assume that you are right, considering how much gargage you collect if listening.

            Now imagine recording those who have not given consent, or the device saving full scripts of movies.

              • Subverb@lemmy.world
                link
                fedilink
                arrow-up
                7
                arrow-down
                1
                ·
                11 months ago

                Anecdotally, the odds are near zero that my wife and I can talk once about maybe buying some obscure thing like electric blinds and suddenly targetted ads for them somehow pop up on our devices.

                This happens a lot.

                I think you’re being naive if you believe they don’t locally distill our discussions into key words and phrases and transmit those.

        • TORFdot0@lemmy.world
          link
          fedilink
          English
          arrow-up
          7
          arrow-down
          1
          ·
          11 months ago

          Ok but third parties have no access to this in the background. My guess is they are buying marketing data from their listed “partners” and making wide claims about how they obtained it. Still a huge breach of privacy though!

  • Melody Fwygon@lemmy.one
    link
    fedilink
    English
    arrow-up
    27
    arrow-down
    2
    ·
    11 months ago

    This is why I generally ensure my phone is configured ahead of time to block ads in most cases. I don’t need this garbage on my device.

    As for how they could listen? It’s pretty easy.

    By waiting until the phone is completely still and potentially on a charger, it can collect a lot of data. Phones typically live on the nightstand by your bed at night; and could be listening intently when charging.

    Similarly it could start listening when it hears extended conversations; simply by listening to the microphone for human speech every x minutes for y minutes. Then it can record snippets; encode them quickly and upload them for processing. This would be thermally undetectable.

    Finally it could simply start listening in certain situations; like when it detects other devices (via BT). Then it could simply capture as many small snippets of your conversation as it could.

      • Melody Fwygon@lemmy.one
        link
        fedilink
        English
        arrow-up
        24
        arrow-down
        2
        ·
        11 months ago

        No.

        Both Android and iOS do enforce permissions against applications that have not been granted explicit access to listen constantly.

        For example, the Google Assistant is a privileged app oftentimes; and it is allowed to listen. It does so by listening efficiently for one kind of sound, the hotword “Ok Google”.

        Other applications not only have to obtain user permission; but oftentimes that permission is restricted to be only granted “While app is in use”, meaning it’s the app on the screen, notifying the user, in the foreground, or recently opened. This permission prevents most abuses of the microphone unless someone is using an app.

      • noodlejetski@lemm.ee
        link
        fedilink
        arrow-up
        13
        arrow-down
        2
        ·
        11 months ago

        the phone’s processor has the wake up word hardcoded, so it’s not like an ad company can add a new one on a whim. and it uses passive listening, so it’s not recording everything you say - I’ve seen it compared to sitting in a class and not paying attention until the teacher says your name.

        • RaoulDook@lemmy.world
          link
          fedilink
          English
          arrow-up
          13
          arrow-down
          2
          ·
          11 months ago

          Have you seen this code though? Every time I hear a statement like that, I have to wonder if you’re all just taking their word for it.

          I don’t take their word for it, unless they show me that code and prove that it is the code running on all the devices in use.

          • WldFyre@lemm.ee
            link
            fedilink
            arrow-up
            2
            arrow-down
            10
            ·
            11 months ago

            Do you also personally audit all open source software that you use?

            • Kilgore Trout@feddit.it
              link
              fedilink
              arrow-up
              5
              ·
              11 months ago

              Your rebuttal makes no sense.

              The issue with proprietary “smart” assistants is that we can only guess how they work.

            • RaoulDook@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              11 months ago

              No but I do review code audits that certified professionals publish for things that I use when they are available, and I also don’t use any voice assistants and only use open source smartphone ROMs such as GrapheneOS.

              Basically I use the opsec methods available to me to prevent as much of the rampant spying that I can. The last thing I would do is put an open mic to Amazon’s audio harvesting bots in my home because that’s incredibly careless.

        • ArcaneSlime@lemmy.dbzer0.com
          link
          fedilink
          arrow-up
          8
          arrow-down
          2
          ·
          11 months ago

          There’s no way that an app with mic permissions could basically do the same thing and pick up on certain preprogrammed words like Ford or Coke which could then be parsed by AI and used by advertisers? It certainly seems like that isn’t out of the realm of physical possibility but I’m definitely no expert. Would they have had to pay the OS maker to hardcode it in to the OS? Could that be done in an update at a later time?

          • noodlejetski@lemm.ee
            link
            fedilink
            arrow-up
            7
            arrow-down
            4
            ·
            edit-2
            11 months ago

            There’s no way that an app with mic permissions could basically do the same thing and pick up on certain preprogrammed words like Ford or Coke which could then be parsed by AI and used by advertisers?

            only if you want the phone to start burning battery and data while displaying the “microphone in use” indicator all the time.

            not to mention that the specific phrases have been picked in order to cause as few false positives as possible (which is why you can’t change them yourself), and you can still fool Google Assistant by saying “hey booboo” or “okay boomer”. good luck with making it reliably recognize “Ford”, lol.

      • Tremont@lemmings.world
        link
        fedilink
        arrow-up
        5
        arrow-down
        1
        ·
        11 months ago

        For that I think they use special hardware, that’s the reason that you can’t modify the calling word, and they still notify you when the voice assistant is disabled. I don’t know if this is actually true, or the companies try to hide behind this, or I just remember it incorrectly.

  • dangblingus@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    28
    arrow-down
    4
    ·
    11 months ago

    We already knew this was happening at least a decade ago when people realized why Facebook and Instagram needed unrestricted microphone permissions.

  • HurlingDurling@lemm.ee
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    1
    ·
    11 months ago

    Copyright © 2023 Cox Media Group, LLC.

    Fucking COX, why am I not surprised a fucking ISP like this garbage is behind it.

  • PotentiallyAnApricot@beehaw.org
    link
    fedilink
    arrow-up
    8
    ·
    11 months ago

    Fascinated by this. Especially because it seems now (ideally) someone with more time and expertise than me will now have to verify or disprove whether companies really do this.

  • elvith@feddit.de
    link
    fedilink
    arrow-up
    6
    ·
    11 months ago

    CMG’s website addresses this with a section that starts “We know what you are thinking…”

    “Is this legal? YES- it is totally legal for phones and devices to listen to you. That’s because consumers usually give consent when accepting terms and conditions of software updates or app downloads,” the website says.

    Well, yes, but actually no. No idea how this might play out in other parts of the world than the US. But in most places, you’d usually need consent of all parties, that are involved. If my neighbor were to install an (infected) app like this, then carries his phone around and talks to me, I did not consent and it would be illegal to record me, even if he were not tricked into consenting, but did knowingly accept it. Worse yet, in the last scenario, he might be on the hook for legal consequences, too…

    Besides that legal minefield, I thinks it’s a bluff. The tech is either way less accurate than they claim, or quite ressource intensive by either eating through your data plan on a mobile phone or draining your battery. My bet is on a PR stunt.

  • mx_smith@lemmy.world
    link
    fedilink
    arrow-up
    6
    ·
    11 months ago

    I wonder if they are gathering this audio data from their own cable boxes, so the data transmission wouldn’t be noticed, they have remotes with microphones for voice commands.