Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Good news! LLM output cannot be copyrighted. Everything that an LLM produces is automatically, irrevocably, in the public domain.


Not quite in my opinion. The output of an LLM from a simple prompt falls into the public domain, but if you also give a copyrighted work as input, the mechanistic transformation performed will not alter the original license (same as encoding a video does not change its license).


Are training data counted as input?

It would be interesting to see a court ruling that the output of LLMs trained on copyleft code are licensed under the GPL ... and all other viral licenses simultaneously


> Are training data counted as input?

It is quantum legality, to use copyright input is legal or illegal depending on the observer.


Schrodinger's Chat


Unless your llm works by quoting large parts of copyrighted works, reinterpretations of them aren't copyrighted. Because it's not a copy.


What if the output regurgitates some other legal entity’s boilerplate licence agreement? Is the output automatically licensed to that entity?


No, the copyright is the colour of the bits, and red bits with a comment saying "these bits are blue" are not blue bits, but you may be prosecuted for fraud.


It's wild to me that there haven't been more court cases to answer questions like those being asked in this thread.

No one knows.


It's new, fast-moving technology, and the courts are slow and expensive.

It would take two stubborn businesses with a lot of money deciding that it is better to battle it out than focus on their business. Something like IBM v SCO or Oracle v Google.


But we also know from other research that LLMs don't actually do mechanistic translations. Even when they are asked to and say that they did, they're basically rewriting the code from their training data


If the LLM output is already someone else's copyrighted work, the LLM doesn't change that?


If that occurs and it’s a substantial enough body of output that it is itself copyrightable and not covered by fair use. Confluence of those conditions is intentionally rare.


The LLM cannot produce copyrighted work.

If the LLM reproduces a human's copyrighted work, then that copyright still stands. This is, in effect, the same as photocopying someone else's writing. The LLM was trained on the copyrighted work, is incapable of producing new copyrightable work, so if it duplicates the original work then the original author's copyright still stands.

I am not a lawyer


Same as it ever was: Either trade secrets or license files that are treated as suggestions.


What if you used the LLM to generate works that were already copyrighted?


There was a recent case that everyone has been describing as "LLM output can't be copyrighted" but what it actually said was you can't register the AI as the author.


This is not true, and I'd love to see some actual citation here.

The courts have repeatedly said that copyright only applies to human creativity. The Supreme Court explicitly said this when they refused to hear the appeal:

https://en.wikisource.org/wiki/Thaler_v._Perlmutter,_Refusal...

> "We affirm our decision to refuse registration for the Work because it lacks the human authorship necessary to be eligible for copyright protection."

So they're saying that the LLM cannot be the author, because LLMs cannot claim copyright.

The related case about patents is more supportive of the narrative that AIs cannot be authors (see https://www.cafc.uscourts.gov/opinions-orders/21-2347.OPINIO...), specifically: "Here, there is no ambiguity: the Patent Act requires that inventors must be natural persons; that is, human beings."

The patent situation is that the Act says that inventor must be an individual, which the courts are interpreting to mean a human, so the LLM cannot be named as the inventor. So, in this case, yes, this is just saying that an LLM cannot be named as the inventor of a patent. That's not the same thing as the courts are saying with copyrights.


> So they're saying that the LLM cannot be the author, because LLMs cannot claim copyright.

They're saying that the LLM can't be the author.

Now suppose you supply the LLM with a prompt that contains human creativity, it performs a deterministic mathematical transformation on the prompt to produce a derivative text, and you want to copyright that, claiming yourself as the author. What happens then?

If you think the answer is that you can't, how do you distinguish that from what happens when someone writes source code and has a compiler turn it into a binary computer program? Or do you think that e.g. Windows binaries can't be copyrighted because they were compiled by a machine?


> Now suppose you supply the LLM with a prompt

My understanding was that they did in fact do just that, but the court somehow misunderstood what they were doing, and assumed that the LLM was working completely autonomously without any human input at all, which isn't really possible IMO. Someone told it what to do.

They also argued that you couldn't copyright an output that you can't explain how it came to be, i.e. if they had been able to articulate how an LLM works, the outcome might have been quite different, which I found surprising.

If art in general (human-made or otherwise) is always derived from existing influences... should we really be forced to explain how or why we created a piece of art in order to defend it?

The usual bar for copyright infringement of a derivative work is, from what I have seen, "how much did you copy from the original, and how obvious is it", which is of course a subjective determination that would be made by each individual judge or jury of a case.


> What happens then?

The part that the human created, the prompt, can be copyrighted.

The part that the LLM created, cannot be.

Copyright in code works exactly the same way: the source code is copyrighted. The binary code is only copyrighted to the extent that it is derived from the source code. This is well-established.


Maybe I am just misunderstanding something, but I feel like you might be contradicting yourself here... why can LLM output not be copyrighted, but compiler output can be?


No, that's the point - the compiler output is only copyrighted to the extent that it is derived from the source code. The compiler itself cannot create anything copyrightable, but because there is a deterministic link between the source code and the binary code, and the source code was the product of a human, the binary code is covered by the source code copyright.

It's like a photocopier. If you photocopy a page from a book, that page is still covered by the copyright of the book author, even if the page is 2x larger or otherwise transformed by the machine.


Powerful interests want it to be true.


IMO the bigger question is how would you even tell if a work was generated by an LLM? There's a ton of code being written out there; the folks who generated it are going to claim they authored it for copyright purposes, and those who want to use it are going to claim it was LLM-generated. So what happens?


The alleged author, when bringing a copyright infringement suit, will submit testimony claiming they wrote it. Parties to the suit will have a chance to present arguments and evidence. Then, the claim will be adjudicated by a judge and/or jury.


That code isn't going to be open source. And if you use someone else's closed source code you are violating laws that have nothing to do with copyright.


I'm not sure I understand. I'm not talking about stolen/leaked code here. I'm saying: imagine you claim you're the author of some piece of code. You may or may not have written it with an LLM, but even if so, assume you have the full rights to all the inputs. You post it publicly on GitHub. You don't attach a license, or perhaps you attach a restrictive license that doesn't permit much beyond viewing. Someone comes across your code, finds it brilliant, and wants to use it. If that code was non-copyrightable (such as generated via an LLM), then they're fine doing it without your permission, no? But if that code was copyrightable, then they're not permitted to do so, correct?

So now consider two questions:

1. You actually didn't use an LLM, but they believe & claim you did. Who has the burden of proof to show that you actually own the copyright, and how do they do so?

2. They write new code that you feel is based on yours. They claim they washed it through an LLM, but you don't believe so. Who has the burden of proof here and how do they do so?


Good questions.

My take on the answers (I am not a lawyer):

1. You copy their code. They bring a copyright claim (let's assume this isn't a DMCA thing and they're actually bringing a claim to court). Your defence is "the LLM wrote it so no copyright attaches". Since they're asserting their copyright claim, they would have to provide evidence for that claim (same as in any other copyright case), including providing evidence that a human wrote it (which is new, and required to defeat your defence).

2. They copy your code. You bring a copyright case. Their defence is "I used an LLM to wash the code without copying". Since they're not disputing your copyright claim to the original code, you don't have to defend or prove your copyright. But you do have to prove that their code infringes on your copyright, which would mean proving that the LLM copied your code when creating the new code. This has been done before by demonstrating similarity.


Can you expand on that, please? Which other laws are infringed if you use someone else's closed source code?


You used an illegal leak to train your llm


What makes the leak illegal other than copyright?

The occasional piece of software might be a trade secret, but a person downloading a preexisting leak isn't affected by those laws.


> What makes the leak illegal other than copyright? The occasional piece of software might be a trade secret, but a person downloading a preexisting leak isn't affected by those laws.

I think 18 U.S.C. § 1832 (a) (3) might answer your question? https://www.law.cornell.edu/uscode/text/18/1832


To qualify as a trade secret, you have to actually register it as a trade secret.

Closed-source code is not automatically a trade secret.


That's completely false as far as I'm aware. Where did you see this? A simple web search shows numerous sources to the contrary. Are you confusing them with patents by any chance? https://en.wikipedia.org/wiki/Trade_secret


Huh, TIL something new. I was sure they had to be registered. Thanks for the correction :)


Is Pierre Menard really the author of his Quixote?


I think it can be copyrighted or is a very complex legal issue. Coding support is used in commercial apps where copyrights are fully reserved. I cannot be feasibly determined if any output is purely LLM or not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: