> So what is it about GPL licensed software that you feel would make AI training on it not subject to the same copyright and fair use considerations that apply to books?
The poster doesn't like it, so it's different. Most of the "legal analysis" and "foregone conclusions" in these types of discussions are vibes dressed up as objective declarations.
You seem like the type of person that will believe anything as long as someone cites a case without looking into it. Bartz v Anthropic only looked at books, and there was still a 1.5 billion settlement that Anthropic paid out because it got those books from LibGen / Anna's Archive, and the ruling also said that the data has to be acquired "legitimately".
Whether data acquired from a licence that specifically forbids building a derivative work without also releasing that derivative under the same licence counts as a legitimate data gathering operation is anyone's guess, as those specific circumstances are about as far from that prior case as they can be.
As long as they don't distribute the model's weights, even a strict interpretation of the GPL should be fine. Same reason Google doesn't have to upstream changes to the Linux kernel they only deploy in-house.
But wouldn't that be like some company using gpl licensed code to host a code generator for something? At least in a legal interpretation. Or is that different?
I mean, is the case you're making that you can run a SaaS business on GPL-derived code without fulfilling GPL obligations because you're not distributing a binary?
If true that would seem to invalidate the entire GPL, but even by that logic, a website (such as chatGPT) distributes javascript that runs the code, and programs like claude code also do so. Again, if you can slip the GPL's requirements through indirection like having your application go phone home to your server to go get the infringing parts, the GPL would essentially unenforceable in... most contexts
That's where the AGPL comes in. The GPL(v2) does not require eg Google or Facebook to release any of the changes they've made to the Linux kernel. That they do so is not because of a legal obligation to do so. The "to get parts" thing is the relevant detail to be very specific on. If those parts are a binary that is used, then the GPL does kick in, but for distributing source code that's possibly derived, possibly not covered by copyright, it's not been decided in a court of law yet.
> This License acknowledges your rights of fair use or other equivalent, as provided by copyright law.
It is legitimate to acquire GPL software. The requirements of the license only occur if you're distributing the work AND fair use does not apply.
Training certainly doesn't count as distribution, so the buck passes to inference, which leaves us dealing with substantial similarity test, and still, fair use.
If a human reads GPL code and outputs a recreation of that code (derivative work) using what they learned - that is illegal.
If an AI reads GPL code and outputs a recreation of that code using what it "learned" - it's not illegal?
If that is the case, then copyright holds no weight any more. I should be allowed to train an LLM on decompiled firmware (say, Playstation, Switch, iPhone) in countries where decompilation is legal - then have the LLM produce equivalent firmware that I later use to build an emulator (or competing open source firmware).
> If that is the case, then copyright holds no weight any more. I should be allowed to train an LLM on decompiled firmware (say, Playstation, Switch, iPhone) in countries where decompilation is legal - then have the LLM produce equivalent firmware that I later use to build an emulator (or competing open source firmware).
It's funny you mention that, because one of the biggest fair use cases that effectively cemented "fair use" for emulators is Sony Computer Entertainment Inc v. Connectix Corp.[1] where the copying of PlayStaion BIOS files for the purposes of reverse engineering and creating an emulator was explicitly ruled to be fair use, including running that code through a disassembler.
You and I are not a fucking judge, our opinions on this don't matter one bit. We might as well print it on a piece of paper and wipe our asses with it.
The poster doesn't like it, so it's different. Most of the "legal analysis" and "foregone conclusions" in these types of discussions are vibes dressed up as objective declarations.