OpenAI says it’s “impossible” to create useful AI models without copyrighted material

sculd@beehaw.org · 1 year ago

OpenAI says it’s “impossible” to create useful AI models without copyrighted material

noorbeast@lemmy.zip · edit-2 1 year ago

I will repeat what I have proffered before:

If OpenAI stated that it is impossible to train leading AI models without using copyrighted material, then, unpopular as it may be, the preemptive pragmatic solution should be pretty obvious, enter into commercial arrangements for access to said copyrighted material.

Claiming a failure to do so in circumstances where the subsequent commercial product directly competes in a market seems disingenuous at best, given what I assume is the purpose of copyrighted material, that being to set the terms under which public facing material can be used. Particularly if regurgitation of copyrighted material seems to exist in products inadequately developed to prevent such a simple and foreseeable situation.

Yes I am aware of the USA concept of fair use, but the test of that should be manifestly reciprocal, for example would Meta allow what it did to MySpace, hack and allow easy user transfer, or Google with scraping Youtube.

To me it seems Big Tech wants its cake and to eat it, where investor $$$ are used to corrupt open markets and undermine both fundamental democratic State social institutions, manipulate legal processes, and undermine basic consumer rights.

sculd@beehaw.org · 1 year ago

Agreed.

There is nothing “fair” about the way Open AI steals other people’s work. ChatGPT is being monetized all over the world and the large number of people whose work has not been compensated will never see a cent of that money.

At the same time the LLM will be used to replace (at least some of ) the people who created those works in the first place.

Tech bros are disgusting.

nicetriangle@kbin.social · edit-2 1 year ago

At the same time the LLM will be used to replace (at least some of ) the people who created those works in the first place.

This right here is the core of the moral issue when it comes down to it, as far as I’m concerned. These text and image models are already killing jobs and applying downward pressure on salaries. I’ve seen it happen multiple times now, not just anecdotally from some rando on an internet comment section.

These people losing jobs and getting pay cuts are who created the content these models are siphoning up. People are not going to like how this pans out.

MagicShel@programming.dev · 1 year ago

Any company replacing humans with AI is going to regret it. AI just isn’t that good and probably won’t ever be, at least in it’s current form. It’s all an illusion and is destined to go the way of Bitcoin, which is to say it will shoot up meteorically and seem like the answer to all kinds of problems, and then the reality will sink in and it will slowly fade to obscurity and irrelevance. That doesn’t help anyone affected today, of course.

nicetriangle@kbin.social · edit-2 1 year ago

I mostly disagree (especially on the long term), but hope you’re right

MagicShel@programming.dev · 1 year ago

It’s garbage for programming. A useful tool but not one that can be used by a non-expert. And I’ve already had to have a conversation with one of my coworkers when they tried to submit absolutely garbage code.

This isn’t even the first attempt at a smart system that enables non-programmers to write code. They’ve all been garbage. So, too, will the next one be but every generation has to try it for themselves. AGI might have some potential some day, but that’s a long long way off. Might as well be science fiction.

Other disciplines are affected differently, but I constantly play with image and text generation and they are all some flavor of garbage. There are some areas where AI can excel but they are mostly professional tools and not profession replacements.

vexikron@lemmy.zip · 1 year ago

OpenAi, please generate your own source code but optimized and improved in all possible ways.

not how programming works, but tech illiterate people seem to think so

nicetriangle@kbin.social · 1 year ago

It was of no use whatsoever to programming or image generation or writing a few years ago. This thing has developed very quickly and will continue to. Give it 5 years and I think things will look very differently.

vexikron@lemmy.zip · 1 year ago

The flip side of this is that many artists who simply copy very popular art styles are now functionally irrelevant, as it is now just literally proven that this kind of basically plagiarism AI is entirely capable of reproducing established styles to a high degree of basically fidelity.

While many aspects of this whole situation are very bad for very many reasons, I am actually glad that many artists will be pressured to actually be more creative than an algorithm, though I admit this comes from basically a personally petty standpoint of having known many, many, many mediocre artists who themselves and their fans treat like gods because they can emulate some other established style.

nicetriangle@kbin.social · edit-2 1 year ago

Literally every artist copies, it’s how we all learn. The difference is that every artist out there does not have an enterprise-class-data-center-powerd-super-human ability to absorb \ and then be able to spit out anything instantly. It still takes time and hard work and dedication. And through the years of hard work people put into learning how their heroes do X, Y, and Z, they develop a style of their own.

It’s how artists cut their teeth and work their way into the profession. What you’re welcoming in is a situation where nobody can find any success whatsoever until they are absolutely original and of course that is an impossible moving target when every original ideal and design and image can just be instantly siphoned back up into the AI model.

Nobody could survive that way. Nobody can break into the artistic industry that way. Except for the wealthy. All the low level work people get earlier in their careers that helps keep them afloat while they learn is gone now. You have to be independently wealthy to become a high level artist capable of creating truly original work. Because there’s no other way to subsidize the time and dedication that takes when all the work for people honing their craft has been hoovered up by machines.

vexikron@lemmy.zip · 1 year ago

No, I am not welcoming an artist apocalypse, that would obviously be bad.

I am noting that I find it amusing to me on a level I already acknowledged was petty and personal that many, many mediocre artists who are absolutely awful to other people socially would have their little cults of fandom dampened by the fact that a machine can more or less to what they do, and their cult leader status is utterly unwarranted.

I do not have a nice and neat solution to the problem you bring up.

I do believe you are being somewhat hyperbolic, but, so was I.

Yep, being an artist in a capitalist hellscape world with modern AI algorithms is not a very reliable way to earn a good living and you are not likely to be have such a society produce many artists who do not have either a lot of free time or money, or you get really lucky.

At this point we are talking about completely reorganizing society in fairly large and comprehensive ways to achieve significant change on this front.

Also this problem applies to far, far more people than just artists. One friend of mine wanted her dream job as running a little bakery! Had to set her prices too high, couldn’t afford a good location, supply chain problems, taxes, didn’t work out.

Maybe someone’s passion is teaching! Welp, that situation is all fucked too.

My point here is: Ok, does anyone have an actual plan that can actually transform the world into somewhere that allow the average person to be far more likely to be able to live the life they want?

Would that plan have more to do with the minutiae of regulating a specific kind of ever advancing and ever changing technology in some kind of way that will be irrelevant when the next disruptive tech proliferates in a few years, or maybe more like an actual total overhaul of our entire society from the ground up?

Omega_Haxors@lemmy.ml · 1 year ago

Tech bros are disgusting.

That’s not even getting into the fraternity behavior at work, hyper-reactionary politics and, er, concerning age preferences.

sculd@beehaw.org · 1 year ago

Yup. I said it in another discussion before but think its relevant here.

Tech bros are more dangerous than Russian oligarchs. Oligarchs understand the people hate them so they mostly stay low and enjoy their money.

Tech bros think they are the savior of the world while destroying millions of people’s livelihood, as well as destroying democracy with their right wing libertarian politics.

TheFreezinSteven@beehaw.org · edit-2 1 year ago

deleted by creator

redcalcium@lemmy.institute · edit-2 1 year ago

I suspect the US government will allow OpenAI to continue doing as it please to keep their competitive advantage in AI over China (which don’t have problem with using copyrighted materials to train their models). They already limit selling AI-related hardware to keep their competitive advantage, so why stop there? Might as well allow OpenAI to continue using copyrighted materials to keep the competitive advantage.

DaDragon@kbin.social · 1 year ago

So why is so much information (data) freely available on the internet? How do you expect a human artist to learn drawing, if not looking at tutorials and improving their skills through emulating what they see?

vexikron@lemmy.zip · edit-2 1 year ago

Yep, completely agree.

Case in point: Steam has recently clarified their policies of using such Ai generated material that draws on essentially billions of both copyrighted and non copyrighted text and images.

To publish a game on Steam that uses AI gen content, you now have to verify that you as a developer are legally authorized to use all training material for the AI model for commercial purposes.

This also applies to code and code snippets generated by AI tools that function similarly, such as CoPilot.

So yeah, sorry, either gotta use MIT liscensed open source code or write your own, and you gotta do your own art.

I imagine this would also prevent you from using AI generated voice lines where you trained the model on basically anyone who did not explicitly consent to this as well, but voice gen software that doesnt use the ‘train the model on human speakers’ approach would probably be fine assuming you have the relevant legal rights to use such software commercially.

Not 100% sure this is Steam’s policy on voice gen stuff, they focused mainly on art dialogue and code in their latest policy update, but the logic seems to work out to this conclusion.

Nacktmull@lemm.ee · 1 year ago

The problem is not the use of copyrighted material. The problem is doing so without permission and without paying for it.

sculd@beehaw.org · 1 year ago

Some relevant comments from Ars:

leighno5

The absolute hubris required for OpenAI here to come right out and say, ‘Yeah, we have no choice but to build our product off the exploitation of the work others have already performed’ is stunning. It’s about as perfect a representation of the tech bro mindset that there can ever be. They didn’t even try to approach content creators in order to do this, they just took what they needed because they wanted to. I really don’t think it’s hyperbolic to compare this to modern day colonization, or worker exploitation. ‘You’ve been working pretty hard for a very long time to create and host content, pay for the development of that content, and build your business off of that, but we need it to make money for this thing we’re building, so we’re just going to fucking take it and do what we need to do.’

The entitlement is just…it’s incredible.

4qu4rius

20 years ago, high school kids were sued for millions & years in jail for downloading a single Metalica album (if I remember correctly minimum damage in the US was something like 500k$ per song).

All of a sudden, just because they are the dominant ones doing the infringment, they should be allowed to scrap the entire (digital) human knowledge ? Funny (or not) how the law always benefits the rich.

sub_o@beehaw.org · edit-2 1 year ago

https://petapixel.com/2024/01/03/court-docs-reveal-midjourney-wanted-to-copy-the-style-of-these-photographers/

What’s stopping AI companies from paying royalties to artists they ripped off?

Also, lol at accounts created within few hours just to reply in this thread.

The moment their works are the one that got stolen by big companies and driven out of business, watch their tune change.

Edit: I remember when Reddit did that shitshow, and all the sudden a lot of sock / bot accounts appeared. I wasn’t expecting it to happen here, but I guess election cycle is near.

flatbield@beehaw.org · 1 year ago

Money is not always the issue. FOSS software for example. Who wants their FOSS software gobbled up by a commercial AI regardless. So there are a variety of issues.

intensely_human@lemm.ee · 1 year ago

I don’t care if any of my FOSS software is gobbled up by a commercial AI. Someone reading my code isn’t a problem to me. If it were, I wouldn’t publish it openly.

sub_o@beehaw.org · 1 year ago

I do, especially when someone’s profiting from it, while my license is strictly for non commercial.

sanzky@beehaw.org · edit-2 1 year ago

What’s stopping AI companies from paying royalties to artists they ripped off?

profit. AI is not even a profitable business now. They exist because of the huge amount of investment being poured into it. If they have to pay their fair share they would not exist as a business.

what OpenAI says is actually true. The issue IMHO is the idea that we should give them a pass to do it.

sub_o@beehaw.org · 1 year ago

Uber wasn’t making profit anyway, despite all the VCs money behind it.

I guess they have reasons not to pay drivers properly. Give Uber a free pass for it too

frog 🐸@beehaw.org · 1 year ago

When you think about it, all companies would make so much more money if they didn’t have to pay their staff, or pay for materials they use! This whole economy and capitalism business, which relies on money being exchanged for goods and services, is clearly holding back profits. Clearly the solution here is obvious: everybody should embrace OpenAI’s methods and simply grab whatever they want without paying for it. Profit for everyone!

lily33@lemm.ee · 1 year ago

This is not REALLY about copyright - this is an attack on free and open AI models, which would be IMPOSSIBLE if copyright was extended to cover the case of using the works for training.
It’s not stealing. There is literally no resemblance between the training works and the model. IP rights have been continuously strengthened due to lobbying over the last century and are already absurdly strong, I don’t understand why people on here want so much to strengthen them ever further.

BraveSirZaphod@kbin.social · 1 year ago

There is literally no resemblance between the training works and the model.

This is way too strong a statement when some LLMs can spit out copyrighted works verbatim.

https://www.404media.co/google-researchers-attack-convinces-chatgpt-to-reveal-its-training-data/

A team of researchers primarily from Google’s DeepMind systematically convinced ChatGPT to reveal snippets of the data it was trained on using a new type of attack prompt which asked a production model of the chatbot to repeat specific words forever.

Often, that “random content” is long passages of text scraped directly from the internet. I was able to find verbatim passages the researchers published from ChatGPT on the open internet: Notably, even the number of times it repeats the word “book” shows up in a Google Books search for a children’s book of math problems. Some of the specific content published by these researchers is scraped directly from CNN, Goodreads, WordPress blogs, on fandom wikis, and which contain verbatim passages from Terms of Service agreements, Stack Overflow source code, copyrighted legal disclaimers, Wikipedia pages, a casino wholesaling website, news blogs, and random internet comments.

Beyond that, copyright law was designed under the circumstances where creative works are only ever produced by humans, with all the inherent limitations of time, scale, and ability that come with that. Those circumstances have now fundamentally changed, and while I won’t be so bold as to pretend to know what the ideal legal framework is going forward, I think it’s also a much bolder statement than people think to say that fair use as currently applied to humans should apply equally to AI and that this should be accepted without question.

MudMan@kbin.social · 1 year ago

I’m gonna say those circumstances changed when digital copies and the Internet became a thing, but at least we’re having the conversation now, I suppose.

I agree that ML image and text generation can create something that breaks copyright. You for sure can duplicate images or use copyrighted characterrs. This is also true of Youtube videos and Tiktoks and a lot of human-created art. I think it’s a fascinated question to ponder whether the infraction is in what the tool generates (i.e. did it make a picture of Spider-Man and sell it to you for money, whcih is under copyright and thus can’t be used that way) or is the infraction in the ingest that enables it to do that (i.e. it learned on pictures of Spider-Man available on the Internet, and thus all output is tainted because the images are copyrighted).

The first option makes more sense to me than the second, but if I’m being honest I don’t know if the entire framework makes sense at this point at all.

lily33@lemm.ee · edit-2 1 year ago

The infraction should be in what’s generated. Because the interest by itself also enables many legitimate, non-infracting uses: uses, which don’t involve generating creative work at all, or where the creative input comes from the user.

MudMan@kbin.social · 1 year ago

I don’t disagree on principle, but I do think it requires some thought.

Also, that’s still a pretty significant backstop. You basically would need models to have a way to check generated content for copyright, in the way Youtube does, for instance. And that is already a big debate, whether enforcing that requirement is affordable to anybody but the big companies.

But hey, maybe we can solve both issues the same way. We sure as hell need a better way to handle mass human-produced content and its interactions with IP. The current system does not work and it grandfathers in the big players in UGC, so whatever we come up with should work for both human and computer-generated content.

intensely_human@lemm.ee · 1 year ago

I can spit out copyrighted work verbatim.

“No Lieutenant, your men are already dead”

See?

lily33@lemm.ee · 1 year ago

But AI isn’t all about generating creative works. It’s a store of information that I can query - a bit like searching Google; but understands semantics, and is interactive. It can translate my own text for me - in which case all the creativity comes from me, and I use it just for its knowledge of language. Many people use it to generate boilerplate code, which is pretty generic and wouldn’t usually be subject to copyright.

intensely_human@lemm.ee · 1 year ago

This is how I use the AI: I learn from it. Honestly I just never got the bug on wanting it to generate creative works I can sell. I guess I’d rather sell my own creative output, you know? It’s more fun than ordering a robot to be creative for me.

AndrasKrigare@beehaw.org · 1 year ago

I know it inherently seems like a bad idea to fix an AI problem with more AI, but it seems applicable to me here. I believe it should be technically feasible to incorporate into the model something which checks if the result is too similar to source content as part of the regression.

My gut would be that this would, at least in the short term, make responses worse on the whole, so would probably require legal action or pressure to have it implemented.

BraveSirZaphod@kbin.social · 1 year ago

The key element here is that an LLM does not actually have access to its training data, and at least as of now, I’m skeptical that it’s technologically feasible to search through the entire training corpus, which is an absolutely enormous amount of data, for every query, in order to determine potential copyright violations, especially when you don’t know exactly which portions of the response you need to use in your search. Even then, that only catches verbatim (or near verbatim) violations, and plenty of copyright questions are a lot fuzzier.

For instance, say you tell GPT to generate a fan fiction story involving a romance between Draco Malfoy and Harry Potter. This would unquestionably violate JK Rowling’s copyright on the characters if you published the output for commercial gain, but you might be okay if you just plop it on a fan fic site for free. You’re unquestionably okay if you never publish it at all and just keep it to yourself (well, a lawyer might still argue that this harms JK Rowling by damaging her profit if she were to publish a Malfoy-Harry romance, since people can just generate their own instead of buying hers, but that’s a messier question). But, it’s also possible that, in the process of generating this story, GPT might unwittingly directly copy chunks of renowned fan fiction masterpiece My Immortal. Should GPT allow this, or would the copyright-management AI strike it? Legally, it’s something of a murky question.

For yet another angle, there is of course a whole host of public domain text out there. GPT probably knows the text of the Lord’s Prayer, for instance, and so even though that output would perfectly match some training material, it’s legally perfectly okay. So, a copyright police AI would need to know the copyright status of all its training material, which is not something you can super easily determine by just ingesting the broad internet.

lily33@lemm.ee · 1 year ago

skeptical that it’s technologically feasible to search through the entire training corpus, which is an absolutely enormous amount of data

Google, DuckDuckGo, Bing, etc. do it all the time.

AndrasKrigare@beehaw.org · 1 year ago

I don’t see why it wouldn’t be able to. That’s a Big Data problem, but we’ve gotten very very good at searches. Bing, for instance, conducts a web search on each prompt in order to give you a citation for what it says, which is pretty close to what I’m suggesting.

As far as comparing to see if the text is too similar, I’m not suggesting a simple comparison or even an Expert Machine; I believe that’s something that can be trained. GANs already have a discriminator that’s essentially measuring how close to generated content is to “truth.” This is extremely similar to that.

I completely agree that categorizing input training data by whether or not it is copyrighted is not easy, but it is possible, and I think something that could be legislated. The AI you would have as a result would inherently not be as good as it is in the current unregulated form, but that’s not necessarily a worse situation given the controversies.

On top of that, one of the common defenses for AI is that it is learning from material just as humans do, but humans also can differentiate between copyrighted and public works. For the defense to be properly analogous, it would make sense to me that it would need some notion of that as well.

HarkMahlberg@kbin.social · 1 year ago

Thank you for your thoroughly analytical take on the subject. Solid points all around.

sour@kbin.social · 1 year ago

deleted by creator

sculd@beehaw.org · 1 year ago

Sorry AIs are not humans. Also executives like Altman are literally being paid millions to steal creator’s work.

lily33@lemm.ee · 1 year ago

I didn’t say anything about AIs being humans.

intensely_human@lemm.ee · 1 year ago

They’re also not vegetables 😡

MNByChoice@midwest.social · 1 year ago

I don’t understand why people on here want so much to strengthen them ever further.

It is about a lawless company doing lawless things. Some of us want companies to follow the spirit, or at least the letter, of the law. We can change the law, but we need to discuss that.

explodicle@local106.com · 1 year ago

IANAL, why isn’t it fair use?

maynarkh@feddit.nl · 1 year ago

The two big arguments are:

Substantial reproduction of the original work, you can get back substantial portions of the original work from an AI model’s output.
The AI model replaces the use of the original work. In short, a work that uses copyrighted material under fair use can’t be a replacement for the initial work.

intensely_human@lemm.ee · 1 year ago

you can get back substantial portions of the original work from an AI model’s output

Have you confirmed this yourself?

chaos@beehaw.org · 1 year ago

In its complaint, The New York Times alleges that because the AI tools have been trained on its content, they sometimes provide verbatim copies of sections of Times reports.

OpenAI said in its response Monday that so-called “regurgitation” is a “rare bug,” the occurrence of which it is working to reduce.

“We also expect our users to act responsibly; intentionally manipulating our models to regurgitate is not an appropriate use of our technology and is against our terms of use,” OpenAI said.

The tech company also accused The Times of “intentionally” manipulating ChatGPT or cherry-picking the copycat examples it detailed in its complaint.

https://www.cnn.com/2024/01/08/tech/openai-responds-new-york-times-copyright-lawsuit/index.html

The thing is, it doesn’t really matter if you have to “manipulate” ChatGPT into spitting out training material word-for-word, the fact that it’s possible at all is proof that, intentionally or not, that material has been encoded into the model itself. That might still be fair use, but it’s a lot weaker than the original argument, which was that nothing of the original material really remains after training, it’s all synthesized and blended with everything else to create something entirely new that doesn’t replicate the original.

intensely_human@lemm.ee · 1 year ago

So that’s a no? Confirming it yourself here means doing it yourself. Have you gotten it to regurgitate a copyrighted work?

SilentStorms@lemmy.dbzer0.com · 1 year ago

It’s crazy how everyone is suddenly in favour of IP law.

t3rmit3@beehaw.org · edit-2 1 year ago

IP law used to stop corporations from profiting off of creators’ labor without compensation? Yeah, absolutely.

IP law used to stop individuals from consuming media where purchases wouldn’t even go to the creators, but some megacorp? Fuck that.

I’m against downloading movies by indie filmmakers without compensating them. I’m not against downloading films from Universal and Sony.

I’m against stealing food from someone’s garden. I’m not against stealing food from Safeway.

If you stop looking at corporations as being the same as individuals, it’s a very simple and consistent viewpoint.

IP law shouldn’t exist, but if it does it should only exist to protect individuals from corporations. When that’s how it’s being used, like here, I accept it as a necessary evil.

interdimensionalmeme@lemmy.ml · 1 year ago

I still think IP needs to eat shit and die. Always has, always will.

I recently found out we could have had 3d printing 20 years earlier but patents stopped that. Cocks !

Daxtron2@startrek.website · 1 year ago

It’s almost like most people are idiots who don’t understand the thing they’re against and are just parroting what they hear/read.

JokeDeity@lemm.ee · 1 year ago

I’m the detractor here, I couldn’t give less of a shit about anything to do with intellectual property and think all copyright is bad.

explodicle@local106.com · 1 year ago

Having read through these comments, I wonder if we’ve reached the logical conclusion of copyright itself.

sanzky@beehaw.org · 1 year ago

copyright has become a tool of oppression. Individual author’s copyright is constantly being violated with little resources for them to fight while big tech abuses others work and big media uses theirs to the point of it being censorship.

frog 🐸@beehaw.org · 1 year ago

Perhaps a fair compromise would be doing away with copyright in its entirety, from the tiny artists trying to protect their artwork all the way up to Disney, no exceptions. Basically, either every creator has to be protected, or none of them should be.

zaphod@lemmy.ca · edit-2 1 year ago

IMO the right compromise is to return copyright to its original 14 year term. OpenAI can freely train on anything up to 2009 which is still a gigantic amount of material while artists continue to be protected and incentivized.

frog 🐸@beehaw.org · 1 year ago

I’m increasingly convinced of that myself, yeah (although I’d favour 15 or 20 years personally, just because they’re neater numbers than 14). The original purpose of copyright was to promote innovation by ensuring a creator gets a good length of time in which to benefit from their creation, which a 14-20 year term achieves. Both extremes - a complete lack of copyright and the exceedingly long terms we have now - suppress innovation.

sanzky@beehaw.org · 1 year ago

that would mean governments prosecuting all offences, which is not going to happen. I doubt any country would have enough resources for doing that

raccoona_nongrata@beehaw.org · edit-2 1 year ago

deleted by creator

explodicle@local106.com · 1 year ago

Apparently they’re going to just make only the little guy’s copyrights effectively meaningless, so yeah.

casmael@startrek.website · 1 year ago

Well in that case maybe chat gpt should just fuck off it doesn’t seem to be doing anything particularly useful, and now it’s creator has admitted it doesn’t work without stealing things to feed it. Un fucking believable. Hacks gonna hack I guess.

intensely_human@lemm.ee · 1 year ago

ChatGPT has been enormously useful to me over the last six months. No idea where you’re getting this notion it isn’t useful.

Bilb!@lem.monster · 1 year ago

People pretending it’s not useful and/or not improving all the time are living in their own worlds. I think you can argue the legality and the ethics, but any anti-ai position based on low quality output (“it can’t even do hands!”) has a short shelf-life.

kingthrillgore@lemmy.ml · 1 year ago

…so stop doing it!

This explains what Valve was until recently not so cavalier about AI: They didn’t want to hold the bag on copyright matters outside of their domain.

fckreddit@lemmy.ml · 1 year ago

Then shutdown your goddamn company until you find a better way.

bedrooms@kbin.social · edit-2 1 year ago

Alas, AI critics jumped onto the conclusion this one time. Read this:

Further, OpenAI writes that limiting training data to public domain books and drawings “created more than a century ago” would not provide AI systems that “meet the needs of today’s citizens.”

It’s a plain fact. It does not say we have to train AI without paying.

To give you a context, virtually everything on the web is copyrighted, from reddit comments to blog articles to open source software. Even open data usually come with copyright notice. Open research articles also.

If misled politicians write a law banning the use of copyrighted materials, that’ll kill all AI developments in the democratic countries. What will happen is that AI development will be led by dictatorships, and that’s absolutely a disaster even for the critics. Think about it. Do we really want Xi, Putin, Netanyahu and Bin Salman to control all the next-gen AIs powering their cyber warfare while the West has to fight them with Siri and Alexa?

So, I agree that, at the end of the day, we’d have to ask how much rule-abiding AI companies should pay for copyrighted materials, and that’d be less than the copyright holders would want. (And I think it’s sad.)

However, you can’t equate these particular statements in this article to a declaration of fuck-copyright. Tbh Ars Technica disappointed me this time.

P03 Locke@lemmy.dbzer0.com · 1 year ago

It’s bizarre. People suddenly start voicing pro-copyright arguments just to kill an useful technology, when we should be trying to burn copyright to the fucking ground. Copyright is a tool for the rich and it will remain so until it is dismantled.

AVincentInSpace@pawb.social · edit-2 1 year ago

Life plus 70 years is bullshit.

20 years from release date is not.

No one except corporate bigwigs will say they should be allowed to do so in perpetuity, but artists still need legal protections to make money off of what they create, and Midjourney (making OpenAI boatloads of money off of making automated collages from artwork they obtained not only without compensation but without attribution) is a prime example of why.

AVincentInSpace@pawb.social · 1 year ago

“But you see, we have to let corporations break the law, because if we don’t, a country we might be at war with later will”

MudMan@kbin.social · 1 year ago

I think viral outrage aside, there is a very open question about what constitutes fair use in this application. And I think the viral outrage misunderstands the consequences of enforcing the notion that you can’t use openly scrapable online data to build ML models.

Effectively what the copyright argument does here is make it so that ML models are only legally allowed to make by Meta, Google, Microsoft and maybe a couple of other companies. OpenAI can say whatever, I’m not concerned about them, but I am concerned about open source alternatives getting priced out of that market. I am also concerned about what it does to previously available APIs, as we’ve seen with Twitter and Reddit.

I get that it’s fashionable to hate on these things, and it’s fashionable to repeat the bit of misinformation about models being a copy or a collage of training data, but there are ramifications here people aren’t talking about and I fear we’re going to the worst possible future on this, where AI models are effectively ubiquitous but legally limited to major data brokers who added clauses to own AI training rights from their billions of users.

sculd@beehaw.org · 1 year ago

People hate them not because it is fashionable, but because they can see what is coming.

Tech companies want to create tools that would replace million of jobs without compensating the very people that created these works in the first place.

MudMan@kbin.social · 1 year ago

That’s not “coming”, it’s an ongoing process that has been going on for a couple hundred years, and it absolutely does not require ChatGPT.

People genuinely underestimate how many of these things have been an ongoing concern. A lot like crypto isn’t that different to what you can do with a server, “AI” isn’t a magic key that unlocks automation. I don’t even know how this mental model works. Is the idea that companies who are currently hiring millions of copywriters will just rely on automated tools? I get that yeah, a bunch of call center people may get removed (again, a process that has been ongoing for decades), but how is compensating Facebook for scrubbing their social media posts for text data going to make that happen less?

Again, I think people don’t understand the parameters of the problem, which is different from saying that there is no problem here. If anything the conversation is a net positive in that we should have been having it in 2010 when Amazon and Facebook and Google were all-in on this process already through both ML tools and other forms of data analysis.

Pratai@lemmy.ca · edit-2 1 year ago

I stand by my opinion that AI will be the worst thing humans ever created, and that means it ranks just a bit above religion.

sculd@beehaw.org · 1 year ago

This is very likely to be true.

Allero@lemmy.today · 1 year ago

I’d argue the issue is not the AI but capitalism.

AI is good, AI companies are evil.

vexikron@lemmy.zip · edit-2 1 year ago

Or, or, or, hear me out:

Maybe their particular approach to making an AI is flawed.

Its like people do not know that there are many different kinds of ways that attempt to do AI.

Many of them do not rely on basically a training set that is the cumulative sum of all human generated content of every imaginable kind.

zagaberoo@beehaw.org · 1 year ago

What ways do you mean? More than just expert-systems, I’d imagine.

vexikron@lemmy.zip · 1 year ago

Well, off the top of my head:

Whole Brain Emulation, attempting to model a human brain as physically accurately as possible inside a computer.

Genetic Iteration (not the correct term for it but it escapes me at the moment), where you set up a simulated environment for digital actors, then simulate quasi-neurons, quasi-body parts dictated by quasi-dna, in a way that mimics actual biological natural selection and evolution, and then you run the simulation millions of times until your digital creature develops a stable survival strategy.

Similar approaches to this have been used to do things like teach an AI humanoid how to develop its own winning martial arts style via many many iterations, starting from not even being able to stand up, much less do anything to an opponent.

Both of these approaches obviously have drawbacks and strengths, and could possibly be successful at far more than what they have achieved to date, or maybe not, due to known or existing problems, but neither of them rely on a training set of essentially the entirety of all content on the internet.

qyron@sopuli.xyz · 1 year ago

If it is impossible, either shut down operations or find a way to pay for it.

webghost0101@sopuli.xyz · edit-2 1 year ago

My concern is they and other tech companies absolutely can and would pay if they have no choice. Paying fines for illegal practices if needs be.

What absolutely wont survive a strong law to keep copyright content out of ai is the open source community which absolutely can not pay for such a thing and would be seriously lacking behind if its excluded, Strengthen the monopoly on ai by for Profit Tech. So basically this issue can have huge ramifications no matter what we end up doing.

frog 🐸@beehaw.org · edit-2 1 year ago

My understanding of the open source community is that taking copyrighted content from people who haven’t willingly signed onto the project would kind of undermine the principles of the movement. I would never feel comfortable using open source software if I had knowledge that part or all of it came from people who hadn’t actively chosen to contribute to it.

I have seen a couple of things recently about AI models that were trained exclusively on public domain and creative commons content which apparently are producing viable content, though. The open source community could definitely use a model like that, and develop it further with more content that was ethically obtained. In the long run, there may be artists that willingly contribute to it, especially those who use open source software themselves (eg GIMP, Blender, etc). Paying it forward, kind of thing.

The problem right now is that artists have no reason to be generous with an open source alternative to AIs, when their rights have already been stomped on and certain people in the open source community are basically saying “if we can’t steal from artists too, then we can’t compete with the corporations.” So there’s literally a trust issue between the creative and tech industries that would need to be resolved before any artists would consider offering art to an open source AI.

webghost0101@sopuli.xyz · 1 year ago

Its quite a mess but I definitely agree that open source needs a good model trained on consented works.

I do fear though that the quality gap between copyright trained and purist models will be huge in the first decenia. And no matter the law, the tech is out there and corporation and criminals will be using it in secret nonetheless.

If only things where as simple as choosing for the chad digital artists. Digital art was part of my higher education and if i Haden t get a tech job i might have been one of them so i feel torn between the divide in industries.

This may sound doomer but since the technology exist we are in a race to obtain beyond human super intelligence and we do not know what will happen after that.

OpenAI had multiple times stated they don’t know if copyright will still mean anything in a future with ai.

We are also facing some huge global issues like global warming where a super intelligence could be the answer to sustain the planet, of course also risking evil ai in the process… i repeat such a mess

I don’t fully trust sam altman, but i do believe what they say may be true. At some point its going to be here and it will be to smart to ignore.

Its optimistically possible that in 20 years we will all be leisurely artist laughing at the idea of needing to work to earn survival.

Its of course just as likely some statehead old bastard presses the deathbutton next week and thats the end of all of it or that climate has progressed beyond what our smartest future ai could possible solve.

frog 🐸@beehaw.org · 1 year ago

I definitely do not have the optimism that in 20 years time we’ll all be leisurely artists. That would require that the tech bros who create the AIs that displace humans are then sufficiently taxed to pay UBI for all the humans that no longer have jobs - and I don’t see that happening as long as they’re able to convince governments not to tax, regulate, or control them, because doing so will make it impossible for them to save the planet from climate change, even as their servers burn through more electricity (and thus resources) than entire countries. Tech bros aren’t going to save us, and the only reason they claim they will is so they never face any consequences of their behaviour. I don’t trust Sam Altman, or any of his ilk, any further than I can throw them.

webghost0101@sopuli.xyz · 1 year ago

That’s is why i am putting some of my eggs in open source, which is where the real innovation happens anyway. Free Ai tools at home running on consumers devices can level people up to build a better future ourselves without having to rely on techbros or government.

Of course i should nuance my wording a bit. My actual opinions tend to be contrasting mix of both optimistic and pessimistic lines of evens. I dont have much hope that the good future is the one we will end on, but it remains in my speculative opinion possible from where we are standing today, yet all can change in less than a week.