If I ask it for the text of Harry Potter, will it give it to me? If I ask it for a copy of a Keith Haring painting, will it give me one? If I ask it to perform Williams’s Jurassic Park theme, will it do so?
If it does, it’s infringing copyright.
If it does not, it is not.
If it just reads the web and learns from copyrighted material, but carefully refuses to republish or perform that material, it should not be considered to infringe, for the same reasons a human student is not. Artistic styles and literary skills are not copyrightable.
e e cummings doesn’t get to forbid everyone else from writing in lowercase.
(Some generations of ChatGPT won’t even recite Shakespeare, due to overzealous copyright filters that fail to correctly count it as public domain. The AI folks are trying!)
What’ll be interesting is when people start asking, “write a song in the style of Marvin Gaye” given the ruling against Robin Thicke a few years back, since that was about the style of the song hedging too closely to Gaye’s output (edit for clarity)
This might upset you but for some uncensored model that have alignment removed will output such content. Is the content true? Don’t know cuz I haven’t read Harry Potter.
Sure, then whoever uses it to extract that text is infringing. If I memorize a copyrighted text, my brain is not an infringement; but if I publicly perform a recitation of that text, that act is infringing.
Really the precedent of search engines (and card catalogs and concordances before them) should yield the same result. Building an index of information about copyrighted works is librarianship; running off new copies of those works is infringement.
On the other hand, AI transparency is also an interesting problem. It may be that one day we can look at a set of neural network weights – or a human brain! – and say “these patterns here are where this system memorized Ginsberg’s ‘Kaddish’.” I hope we will not conclude that brains must be lobotomized to remove copyrighted memorized texts.
If we treat model like a brain that “memorize” copyrighted text and generate new text based on that, your statement is valid. However, this will also prohibit any copyright claims on the model’s output, as the act of memorization isn’t a work. Only work can infringe on other works, which should the output of models defined as “work” is still under heavy debate. Even if it is defined as a work, can a model gain copyright while not being a legal person? Who should bear the liability then? What if the output is modify by an editor? This rabbit hole digs deep.
I think that actually was ruled on a few months ago. No the model cannot hold copyright. Nor can the person that commissioned the model to create the work. I think where things are still a bit grey (someone correct me if I’m wrong), is when a person creates a work with the assistance of AI whereas it’s a mix of human and AI generated content.
The model doesn’t contain the training data—it can’t reproduce the original work even if it were instructed to, except by accident. And it wouldn’t know it had done so unless it were checked by some external process that had access to the original.
In case anyone wants to try this out: Get ComfyUI and this plugin to get access to unsampling. Unsample to the full number of steps you’re using, and use a cfg=1 for both sampler and unsampler. Use the same positive and negative prompt for both sampler and unsampler (empty works fine, or maybe throw BLIP at it). For A1111: alternative img2img, only heard of it never used it.
What unsampling is doing is finding the noise that will generate a specific image, and it will find noises that you can’t even get through the usual interface (because there’s more possible latent images than noise seeds). Cfg=1 given the best reproduction possible. In short: The whole thing shows how well a model can generate a replica of something by making sure it gets maximally lucky.
This will work very well if the image you’re unsampling was generated by the model you’re using to unsample and regenerate it, it will work quite well with related models, imparting its own biases on it, and it’s way worse for anything else. If you ask it to re-create some random photograph it’s going to have its own spin on it changing up pretty much all of the details, if you try to do something like re-creating a page of text it’s going to fail miserably as stable diffusion just can’t hack glyphs.
The question should be pretty simple:
Does the AI product output copyrighted material?
If I ask it for the text of Harry Potter, will it give it to me? If I ask it for a copy of a Keith Haring painting, will it give me one? If I ask it to perform Williams’s Jurassic Park theme, will it do so?
If it does, it’s infringing copyright.
If it does not, it is not.
If it just reads the web and learns from copyrighted material, but carefully refuses to republish or perform that material, it should not be considered to infringe, for the same reasons a human student is not. Artistic styles and literary skills are not copyrightable.
e e cummings doesn’t get to forbid everyone else from writing in lowercase.
(Some generations of ChatGPT won’t even recite Shakespeare, due to overzealous copyright filters that fail to correctly count it as public domain. The AI folks are trying!)
What’ll be interesting is when people start asking, “write a song in the style of Marvin Gaye” given the ruling against Robin Thicke a few years back, since that was about the style of the song hedging too closely to Gaye’s output (edit for clarity)
That’s what got me into using chatgpt. I’d ask it a question like “how can I trouble shoot this issue I’m havibg”
It would give me this big answer then I’d ask it to give me the answer back as an Elton John song. So much fun. Can’t have nice things though anymore
Perhaps every federal judge should receive a copy of Spider Robinson’s “Melancholy Elephants”. (It’s under a Creative Commons license.)
http://www.spiderrobinson.com/melancholyelephants.html
The issue is still the output, not the model itself.
Just like a gun.
@fubo
It’s been suggested that AI art created without human input cannot be receive copyrights;
https://www.reuters.com/legal/ai-generated-art-cannot-receive-copyrights-us-court-says-2023-08-21/
This might upset you but for some uncensored model that have alignment removed will output such content. Is the content true? Don’t know cuz I haven’t read Harry Potter.
Sure, then whoever uses it to extract that text is infringing. If I memorize a copyrighted text, my brain is not an infringement; but if I publicly perform a recitation of that text, that act is infringing.
Really the precedent of search engines (and card catalogs and concordances before them) should yield the same result. Building an index of information about copyrighted works is librarianship; running off new copies of those works is infringement.
On the other hand, AI transparency is also an interesting problem. It may be that one day we can look at a set of neural network weights – or a human brain! – and say “these patterns here are where this system memorized Ginsberg’s ‘Kaddish’.” I hope we will not conclude that brains must be lobotomized to remove copyrighted memorized texts.
If we treat model like a brain that “memorize” copyrighted text and generate new text based on that, your statement is valid. However, this will also prohibit any copyright claims on the model’s output, as the act of memorization isn’t a work. Only work can infringe on other works, which should the output of models defined as “work” is still under heavy debate. Even if it is defined as a work, can a model gain copyright while not being a legal person? Who should bear the liability then? What if the output is modify by an editor? This rabbit hole digs deep.
I think that actually was ruled on a few months ago. No the model cannot hold copyright. Nor can the person that commissioned the model to create the work. I think where things are still a bit grey (someone correct me if I’m wrong), is when a person creates a work with the assistance of AI whereas it’s a mix of human and AI generated content.
The model doesn’t contain the training data—it can’t reproduce the original work even if it were instructed to, except by accident. And it wouldn’t know it had done so unless it were checked by some external process that had access to the original.
In case anyone wants to try this out: Get ComfyUI and this plugin to get access to unsampling. Unsample to the full number of steps you’re using, and use a cfg=1 for both sampler and unsampler. Use the same positive and negative prompt for both sampler and unsampler (empty works fine, or maybe throw BLIP at it). For A1111: alternative img2img, only heard of it never used it.
What unsampling is doing is finding the noise that will generate a specific image, and it will find noises that you can’t even get through the usual interface (because there’s more possible latent images than noise seeds). Cfg=1 given the best reproduction possible. In short: The whole thing shows how well a model can generate a replica of something by making sure it gets maximally lucky.
This will work very well if the image you’re unsampling was generated by the model you’re using to unsample and regenerate it, it will work quite well with related models, imparting its own biases on it, and it’s way worse for anything else. If you ask it to re-create some random photograph it’s going to have its own spin on it changing up pretty much all of the details, if you try to do something like re-creating a page of text it’s going to fail miserably as stable diffusion just can’t hack glyphs.