I’m rather curious to see how the EU’s privacy laws are going to handle this.

(Original article is from Fortune, but Yahoo Finance doesn’t have a paywall)

  • Primarily0617@kbin.social
    link
    fedilink
    arrow-up
    207
    arrow-down
    7
    ·
    edit-2
    1 year ago

    it’s crazy that “it’s too hard :(” has become an acceptable justification for just ignoring the law within tech circles

    • BrianTheeBiscuiteer@lemmy.world
      link
      fedilink
      English
      arrow-up
      94
      arrow-down
      3
      ·
      1 year ago

      I’m not an AI expert, and I wouldn’t say it is too hard, but I believe removing a specific piece of data from a model is like trying to remove excess salt from a stew. You can add things to make the stew less salty but you can’t really remove the salt.

      The alternative, which is a lot of effort but boo-hoo for big tech, is to throw out the model and start over without the data in question. These companies would do well to start with models built on public or royalty free data and then add more risky data on top of that (so you only have to rebake starting from the “public” version).

      • Primarily0617@kbin.social
        link
        fedilink
        arrow-up
        47
        arrow-down
        1
        ·
        1 year ago

        sounds like big tech shouldn’t have spent the last decade investing in a kitchen refit so that they could make stew really well but nothing else

      • GoosLife@lemmy.world
        link
        fedilink
        English
        arrow-up
        28
        arrow-down
        1
        ·
        edit-2
        1 year ago

        If there’s something illegal in your dish, you throw it out. It’s not a question. I don’t care that you spent a lot of time and money on it. “I spent a lot of time preparing the circumstances leading to this crime” is not an excuse, neither is “if I have to face consequences for committing this crime, I might lose money”.

        • Quokka@quokk.au
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          3
          ·
          1 year ago

          Fuck no.

          It’s illegal to be gay in many places, should we throw out any AI that isn’t homophobic as shit?

          • GoosLife@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            No, especially because it’s not the same thing at all. You’re talking about the output, we’re talking about the input.

            The training data was illegally obtained. That’s all that matters here. They can train it on fart jokes or Trump propaganda, it doesn’t really matter, as long as the Trump propaganda in question was legally obtained by whoever trained the model.

            Whether we should then allow chatbots to generate harmful content, and how we will regulate that by limiting acceptable training data, is a much more complex issue that can be discussed separately. To address your specific example, it would make the most sense that the chatbot is guided towards a viewpoint that aligns with its intended userbase. This just means that certain chatbots might be more or less willing to discuss certain topics. In the same way that an AI for children probably shouldn’t be able to discuss certain topics, a chatbot that’s made for use in highly religious area, where homosexuality is very taboo, would most likely not be willing to discuss gay marriage at all, rather than being made intentionally homophobic.

            • Quokka@quokk.au
              link
              fedilink
              English
              arrow-up
              1
              ·
              1 year ago

              The output only exists from the input.

              If you feed your model only on “legal” content, that would in many places ensure it had no LGBT+ positive content.

              Legality (and the dubious nature of justice systems) of training data is not the angle to be going for.

              • GoosLife@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                1 year ago

                You seem to think the majority of LGBT+ positive material is somehow illegal to obtain. That is not the case. You can feed it as much LGBT+ positive material as you like, as long as you have legally obtained it. What you can’t do is train it on LGBT+ positive material that you’ve stolen from its original authors. Does that make more sense?

                • Quokka@quokk.au
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  arrow-down
                  1
                  ·
                  1 year ago

                  You do know being LGBT+ in many places is illegal, right? And can even carry the death penalty.

                  Legality is not important and we should not care if it’s considered legal or not, because what’s legal isn’t what’s right or ethical.

                  • GoosLife@lemmy.world
                    link
                    fedilink
                    English
                    arrow-up
                    1
                    ·
                    1 year ago

                    Yes I am aware of that. However, I’m not sure how this has anything to do with the fact that it is also illegal to steal data, then continue to use said data to make profits after having been found out. The two are not connected in any logical way, which makes it hard for me to continue to address your concerns in a way that makes sense.

                    The way I see it, you’re either completely missing what we’re talking about, or you have some misunderstanding of what the AI language models actually are, and what they can do.

                    For the record, I’m in no way disagreeing with your views, or your statements that legal and ethical don’t always overlap. It is clear to me that you are open minded and well-intended, which I appreciate, and I hope you don’t take this the wrong way.

      • Grandwolf319@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        11
        arrow-down
        1
        ·
        1 year ago

        Replace salt with poison or an allergenic substance and if fully holds. If a batch has been contaminated, then yes, you should try again.

        But now that the cat is out of the bag, other companies are less willing to let something be scrap able due to how valuable it can be.

        I think big tech knew this, that they can only build these models on unfiltered data before the AI craze.

      • Tyfud@lemmy.one
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        1 year ago

        I work in this field a good bit, and you’re largely correct. That’s a great analogy of trying to remove salt from a stew. The only issue with that analogy is that that’s technically possible still by distilling the stew and recovering the salt. Even though it would destroy the stew.

        At the point that pii data is in the model, it’s fully baked. It’d be like trying to get the eggs out of a baked cake. The chemical composition has changed into something else completely.

        That’s how building a model works today. Like baking a cake.

        I’m order to remove or even identify pii data in ML models or LLMs today, we’d need a whole new way of baking a cake that would keep the eggs separate from the cake until just before you tried to take a bite out of it. The tools today don’t allow you to do anything like that. They bake you a complete cake.

      • Fushuan [he/him]@lemm.ee
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        Something to take in mind is that yes, they would need to retrain the models from zero, but if they did it in any kind of basic decent method they should have backups and versions of the data they used to train and they would need to retrain everything with a subset of the original data. Then, the optimizations they have already applied to the system should be able to be reapplied in the same manner and the product should be somewhat similar. Another thing would be to design a de training process, where you generate an input from the “must be deleted” input that when trained acts as some sort of “negative input” and the model ends up in the same place it would have ended up if it were not trained with the “must be deleted” data.

        I bet you that if governments act harsh enough tech companies will develop some sort of “negative training”.

        In the end this is a solvable math optimization problem, what input do I need to feed the already trained model for it to become the equivalent model it would be if trained without the requested data.

        We could even create an ML model that computes a “good enough negative input” from several examples, since testing the quality of the results is quite simple, and we can train it with several trained model examples. This model would be fed with a base model, some input data and another base model trained without that data.

        All in all, AI companies will tell you that this is very hard because they would essentially be investing hours and development to create a tool that makes their model worse instead of better, so expect a lot of pushback.

    • Zeth0s@lemmy.world
      link
      fedilink
      English
      arrow-up
      20
      ·
      edit-2
      1 year ago

      It’s actually a pretty normal thing in law. Laws are created with common sense in mind and compromises.

      Currently EU laws do not cover generative AI. Now EU needs to decide how to deal with it. If consider it as a “lossy compressed database”, trying to enforce a variation of gdpr with added fuzziness, or do something else

    • Alien Nathan Edward@lemm.ee
      link
      fedilink
      English
      arrow-up
      7
      ·
      1 year ago

      I just saw an article that said that ISPs are trying to whine their way out of listing the fees they charge because it’s too hard. Which is wild because they certainly know what I owe them after I sign the contract, but somehow it’s just impossible for them to determine right up until the moment that I’m obligated to pay it.

    • garyyo@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      1 year ago

      Always has been. The laws are there to incentivize good behavior, but when the cost of complying is larger than the projected cost of not complying they will ignore it and deal with the consequences. For us regular folk we generally can’t afford to not comply (except for all the low stakes laws that you break on a day to day basis), but when you have money to burn and a lot is at stake, the decision becomes more complicated.

      The tech part of that is that we don’t really even know if removing data from these sorts of model is possible in the first place. The only way to remove it is to throw away the old one and make a new one (aka retraining the model) without the offending data. This is similar to how you can’t get a person to forget something without some really drastic measures, even then how do you know they forgot it, that information may still be used to inform their decisions, they might just not be aware of it or feign ignorance. Only real way to be sure is to scrap the person. Given how insanely costly it can be to retrain a model, the laws start looking like “necessary operating costs” instead of absolute rules.