The New York Times sues OpenAI and Microsoft for copyright infringement

L4sBot@lemmy.world · 11 months ago

The New York Times sues OpenAI and Microsoft for copyright infringement

phoneymouse@lemmy.world · 11 months ago

There is something wrong when search and AI companies extract all of the value produced by journalism for themselves. Sites like Reddit and Lemmy also have this issue. I’m not sure what the solution is. I don’t like the idea of a web full of paywalls, but I also don’t like the idea of all the profit going to the ones who didn’t create the product.

Kecessa@sh.itjust.works · edit-2 11 months ago

The solution is imposing to these companies the responsibility of tracking their profit per media, tax them and redistribute that money based on the tracking info. They’re able to track all the pages you visit, it’s complete bullshit when they say they don’t know how much they make for each places their ads are displayed.

AllonzeeLV@lemmy.world · 11 months ago

but I also don’t like the idea of all the profit going to the ones who didn’t create the product.

Should… should we tell him?

Kilgore Trout@feddit.it · 11 months ago

Tell them instead of mocking them.

Yes, “that’s how the world works”. But doesn’t mean we should stop trying to change it.

DogWater@lemmy.world · 11 months ago

Ai isn’t creating the product. It consumed it.

Boiglenoight@lemmy.world · 11 months ago

AI training is piracy by another name.

Uriel238 [all pronouns]@lemmy.blahaj.zone · 11 months ago

Elaborate. Consumption of copyrighted materials is normal use whether by a human or a machine.

Boiglenoight@lemmy.world · 11 months ago

Taking someone else’s work and using it without crediting them or compensating them is theft. If Open AI made a deal with The NY Times to train its product using the papers content, which it would turn around and sell to its own customer base, that would be ethical. What Open AI and other companies like it are doing are stealing ahead of actual law that defines what they’re doing as such.

Uriel238 [all pronouns]@lemmy.blahaj.zone · edit-2 11 months ago

So listening to Billie Jean without thanking Michael Jackson is theft? That is use.

How about Billie Jean’s baseline which is borrowed from Hall and Oates I Can’t Go For That. Was that theft? Michael felt guilty about it but John felt it was routine for creatives to borrow from each other all the time.

How about money- and lobbyist-inspired extensions of copyright so extreme that both songs (heck, the whole oupuses of both artists) have been denied from the public domain? Is that theft too? Or does it only count when companies and rich estates are denied profits?

From your copyright infringement is theft blanket assertion and your inability or refusal to parse out fair use of copyrighted materials, I infer you don’t actually understand what copyright is or what purpose it is meant to serve to the public. You are just regurgitating the maximalist rhetoric you’ve been spoonfed. Its really kinda sad.

Feel free to exercise more nuance. Or if you like you can double down and remove all doubt.

Boiglenoight@lemmy.world · 11 months ago

Using a tool to copy someone else’s work and then profiting off that work without compensating or even attributing the source is stealing.

JonEFive@midwest.social · 11 months ago

Your argument poses an interesting thought. Do machines have a right to fair use?

Humans can consume for the sake of enjoyment. Humans can consume without a specific purpose of compiling and delivering that information. Humans can do all this without having a specific goal of monetary gain. Software created by a for-profit privately held company is inherently created to consume data with the explicit purpose of generating monetary value. If that is the specific intent and design then all contributors should be compensated.

Then again, we can look no further than Google (the search engine, not the company) for an example that’s a closely related to the current situation. Google can host excerpts of data from billions of websites and serve that data up upon request without compensating those site owners in any way. I would argue that Google is different though because it literally cites every single source. A search result isn’t useful if we don’t know what site the result came from.

And my final thought - are works that AI generates is truly transformative? I can see arguments that go either way.

General_Effort@lemmy.world · 11 months ago

Do machines have a right to fair use?

Machines do not have rights or obligations. They cannot be held liable to pay damages or be sentenced for crimes. They cannot commit copyright infringement. But I don’t think we’ll see “the machine did it” as a defense in court.

are works that AI generates is truly transformative?

Usually they are original and not transformative.

Transformative implies that there is some infringement going on. Say, you make a cartoon with the recent Mickey Mouse. But instead of making the same kind of cartoon as Disney would, you use MM to criticize the policies of the Disney corporation (like South Park did). That transforms the work.

Sometimes AI spits out verbatim copies of training data. That is usually transformative. A couple pages of Harry Potter turn into a technical malfunction.

I hope you’ll answer a question in return:

Software created by a for-profit privately held company is inherently created to consume data with the explicit purpose of generating monetary value. If that is the specific intent and design then all contributors should be compensated.

Why? What’s the ethical/moral justification for this?

I know how anarcho-capitalists, so-called libertarians, and other such ideologies see it, but perhaps you have a different take. These groups are also not necessarily on board with the whole intellectual property concept. So that’s what I am curious about. Full disclosure: I am absolutely not on board with that kind of thinking and am unlikely to be convinced. But I am genuinely interested in learning more.

JonEFive@midwest.social · 10 months ago

Just getting back around to this.

My main reasoning is simply that authors and artists should be fairly credited and compensated for their work. If I create something and share it on the internet, I don’t necessarily want a company to make money on that thing, especially if they’re making money to my exclusion.

So while I belive that IP as we know it today is probably not be the best way to handle things, I still think creators should have some say over how their works are used and should receive some reasonable share when their works are used for profit. Without creators, those works wouldn’t exist in the first place.

Are there other jobs where it would be okay to take a person’s services without paying them? What would motivate people to continue providing those services?

treefrog@lemm.ee · edit-2 11 months ago

Not the original comment but I think the difference you’re looking for is in the copying and distribution. The OC makes the false assumption that the data set is full copies of every object fed into it rather than sets of common characteristics.

For example, my own mind has a concept tree. Tree is not a copy of every tree I’ve ever known but more like lists of common characteristics that define treeness based on information I’ve gathered about treeness (my data set).

Piracy is piracy not because of how it’s consumed, but rather, how it’s distributed and stored, as full copies of the object. Datasets are not copies, in other words. And thus copyright doesn’t apply.

Reading an article to get an idea about what articleness is, is fair use. Reading an article to reproduce it verbatim is not. And as of now, I don’t believe LLMs are doing the later.

LainOfTheWired@lemy.lol · 11 months ago

My question is how is an AI reading a bunch of articles any different from a human doing it. With this logic no one would legally be able to write an article as they are using bits of other peoples work they read that they learnt to write a good article with.

They are both making money with parts of other peoples work.

hansl@lemmy.world · 11 months ago

It was thought that the LLM wouldn’t keep the actual data internally verbatim. If you can memorize an article, and recite it to everyone free of charge, technically it’s plagiarism. Same if you sing a song to a crowd when you don’t have the rights.

The Google research (and other discovery) proved that you can actually extract verbatim training data from a LLM. Which has a lot of implications for copyright.

MirthfulAlembic@lemmy.world · 11 months ago

The physical limitations are an important difference. A human can only read and remember so much material. With AI, you can scale that exponentially with more compute resources. Frankly, IP law was not written with this possibility in mind and needs to be updated to find a balance.

JonEFive@midwest.social · 11 months ago

Let me ask you this: when have you ever seen ChatGPT cite its sources and give appropriate credit to the original author?

If I were to just read the NYT and make money by simply summarizing articles and posting those summaries on my own website without adding anything to it like my own commentary and without giving credit to the author, that would rightfully be considered plagiarism.

This is a really interesting conundrum though. I would argue that AI isn’t capable of original thought the way that humans are and therefore AI creators must provide due compensation to the authors and artists whose data they used.

AI is only giving back some amalgamation of words and concepts that it has been trained on. You might say that humans do the same, but that isn’t exactly true. The human brain is a funny thing. It can forget, it can misremember. It can manipulate. It can exaggerate. It can plan. It can have irrational or emotional responses. AI can’t really do those things on its own. It’s just mimicking human behavior at best.

Most importantly to me though, AI is not capable of spontaneous thought. It is only capable of providing information that it has been trained on and only when prompted.

thru_dangers_untold@lemm.ee · edit-2 11 months ago

There is evidence to suggest some LLM’s have the ability to produce original outputs, such as DeepMind’s solution to the cap set problem.

https://www.nature.com/articles/s41586-023-06924-6

On the other hand LLM’s have some incredible text compression abilities

https://arxiv.org/abs/2308.07633

I’m pretty sure there is copyright infringement going on by the letter of the law. But I also think the world would be better off if copyright laws were a bit more loose. Not wild-west anything-goes libertarianism, but more open than the current state.

JonEFive@midwest.social · 10 months ago

I tend to agree with your last point, especially because of the way the system has been bastardized over the years. What started out as well intentioned legislation to ensure that authors and artists maintain control over their work has become a contentious and litigious minefield that barely protects creators.

General_Effort@lemmy.world · 11 months ago

Let me ask you this: when have you ever seen ChatGPT cite its sources and give appropriate credit to the original author?

Bing chat now does that by default. Normally you have to prompt that manually.

If I were to just read the NYT and make money by simply summarizing articles and posting those summaries on my own website without adding anything to it like my own commentary and without giving credit to the author, that would rightfully be considered plagiarism.

No. It would be considered journalism. If you read the news a bit, you will find that they reference the output of other news corporations quite a bit. If your preferred news source does not do that, then they simply don’t cite their sources.

JonEFive@midwest.social · 10 months ago

Prompting for a source wouldn’t satisfy me until I could trust that the AI wasn’t hallucinating. After all, if GPT can make up facts about things like legal precedent or well documented events, why would I trust that its citations are legitimate?

And if the suggestion is that the person asking for the information double check the cited sources, maybe that’s reasonable to request, but it somewhat defeats the original purpose.

Bing might be doing things differently though, so you might be right in your assessment on that front. I haven’t played with their AI yet.

General_Effort@lemmy.world · 10 months ago

You did ask if ChatGPT had ever sighted sources. Bing uses it and besides, you can ask for that manually.

Whether it defeats the purpose depends on your original purpose.

BURN@lemmy.world · 11 months ago

An AI does not learn like a human does. Therefore the same laws and principles can’t be applied to computer “learning” as can be to human learning.

They’re fundamentally different uses of the material.

topinambour_rex@lemmy.world · edit-2 10 months ago

The main difference being the volume. An example I like is how Google trained his gaming AI to starcraft 2. This AI was able to beat high ranked professional gamers. It was trained by watching a century of games.

Chatgpt didn’t read few articles, it read years of them, maybe a couple of decades.

kingthrillgore@lemmy.ml · 11 months ago

If this lawsuit causes it to be ILLEGAL to read anything you buy because you could plagiarize it, Bradbury is gonna spin in his fucking grave.

burliman@lemmy.today · 11 months ago

Reminds me of Nokia suing Apple (two waves), Blockbuster suing Netflix, and Yahoo suing Facebook. Threatened, declining company suing a disruptor is what we can expect will always happen I guess. Will be nice to see this stuff finally tested in court though.

jacksilver@lemmy.world · 11 months ago

Except the news still needs to come from somewhere. While GPT can “create” things, it’s not a journalist. It’s just the next step in aggregation skimming money from the actual sources.

maegul (he/they)@lemmy.ml · 11 months ago

Interesting take on mastodon on this in this thread: https://hachyderm.io/@Impossible_PhD/111654403989681220

The New York Times sues OpenAI and Microsoft for copyright infringement

The New York Times sues OpenAI and Microsoft for copyright infringement

The New York Times sues OpenAI and Microsoft for copyright infringement | CNN Business