Hacker Newsnew | past | comments | ask | show | jobs | submit | martin-t's commentslogin

> What I find unacceptable is that now they consider as their private property what they have mined from public lands.

So how do you propose to fix that without a law similar to copyright? (At least similar to the intent of copyright, the specific implementation leaves much to be desired, obviously.)


In that case, hopefully the copyright mafia will take the money from US and Chinese LLM companies and redistribute it to the people who did the actual work fueling the models, such as myself.

I did not spend 10 years writing (A)GPL code for all of it to be stripped of its license, remixed and sold for profit.

Of course in a truly just world, the LLM companies who took my code without permission would beg with offers of owning a share of them because if I didn't consent their models would have to be destroyed.

Since my work is apparently so valuable that they just have to have it, it should count towards my retirement age too.


I have written a bunch of (A)GPL code too and I'm 100% supportive of AI learning from it.

Great, you have that right.

But it begs the question why you chose (A)GPL instead of a permissive license in the first place. If you are OK with people training and using LLMs on top of your work not giving users the right to inspect and modify, you must have logically been OK with it before. The existence of LLMs or even AI is irrelevant to your position.


Yes, and I generally prefer permissive licenses precisely because of this. Nevertheless, I've also done GPL and AGPL code.

That's my point. You don't care about freedom or user rights and you never did. LLMs don't change that, they just give people like you a way to (for now) legally take without giving back and without respecting the wishes of the people whose work you're building on.

Look at consent in sex and how long it took to make everyone accept it's something you need to have sex with someone. Some places are still not there yet.

I firmly believe using other people's work should require the same level of consent.


Can you prove your work was used in any of these models? And if so what percentage of your work constitutes the model?

> Can you prove your work was used in any of these models?

They admit it themselves. We also know how aggressively they scrape everything they can get their hands on because projects like Anubis[0] exist

> And if so what percentage of your work constitutes the model?

That should absolutely be quantified, yes. My part is tiny but together with other people whose work was taken without consent, we make the vast majority. Last time I napkinned the math, I estimated making the models took 10^12 hours of work (nearly all being scraped public and possibly even private projects), out of which only 10^6 was paid (work by the employees of the LLM companies). So roughly 10^12 remains unpaid.

[0]: https://github.com/TecharoHQ/anubis


Sounds like a no.

"They admit it themselves."

"Sounds like a no."

This level of discussion does not belong on HN.


> LLM companies and redistribute it to the people who did the actual work fueling the models, such as myself.

Okay this is a new level of narcissism and gatekeeping, thanks for the laugh.


If you think the belief that people should be rewarded pro productive and useful work is narcissism, then you need to either learn about work or narcissism.

If laws were written by engineers, all sums of money would be expressed relative to median income.

And at that point it wouldn't be a stretch for most people to make the connection that some people are more privileged than others and fines should be relative to personal wealth and income.

Imagine if laws were written by people who know what a function is...


Australia already does something like this. I don't know if there were any engineers involved in designing it.

https://www.afsa.gov.au/professionals/resource-hub/penalty-u...


>If laws were written by engineers, all sums of money would be expressed relative to median income.

If laws were written by engineers, money would hold its value, and the laws wouldn't require constant adjustment.


You and I must know different engineers.

If all laws were written by the engineers I know, they'd contain loopholes so they could do whatever they want.

    * Houses must follow building code, except for subparagraph C part II.

    * Vehicles must obey speed limits, except for subparagraph C part II.

    …

    * Somewhere buried in subparagraph C part II: Any owner, occupier, or user of property, real or physical, can file a writ of "I don't want to" with the county, which shall be automatically accepted, exempting said owner, occupier, or user from regulations.

That's literally how it works right now.

Well, inflation is a trick how to steal from people without actually reaching into their wallets. In that regard, it's genius.

Imagine the alternative that once in a while the government says it needs more money and just subtracts it from your bank account. There'd be riots.


> There'd be riots

Like in France?

I think that was over gas prices.


Why median income? Why not the income of the individuals concerned. Any fixed value fine simply means that the wealthy can just treat it as a cost.

That's what I meant in the second sentence.

It's not just about fines, many countries have support for families with children. I don't think the rich should get more money per child. For fines, it absolutely makes sense.

A harsher alternative is to stop using fines altogether and instead give "prison micro-sentences" - a few hours or days in prison. It makes perfect sense - when you pay a fine, you lost a bit of your life by working and not having anything to show for it at the end. So why not make it direct and just take a bit of time directly from the person. It nicely sidesteps various tricks how the rich hide their assets, too.

Of course, the administrative overhead would be much larger, but then the offenders could take some part in maintaining the prison. Nothing would be more humbling to a privileged person than cleaning the prison toilet.


> "prison micro-sentences"

Brilliant!


From https://etsc.eu/billionaires-eur-25000-drink-driving-fine-pu...

> A Norwegian billionaire that recorded a BAC level three times higher than the legal limit has been banned from driving and handed a 250,000 krone fine (EUR 25,000). But the fine could have been much higher as, under Norwegian law, fines are linked to monthly income and in some cases overall wealth.

> Finland has a ‘day fine’ system, with penalties linked to an offender’s wages.


I think it's pretty common in Europe, at least for minor offences. We have it in Switzerland, too. How it works here is that the fines are defined in "day rates" instead of monetary amount, and a day rate is half your daily income.

We don't have it in Romania.

You're right, legally speaking.

But you shouldn't be right. I mean, morally.

The law is a compromise between what the people in power want and what they can get away with without people revolting. It has nothing to do with morality, fairness or justice. And we should change that. The promise of democracy was (among other things) that everyone would be equal, everybody would get to vote and laws would be decided by the moral system of the majority. And yet, today, most people will tell you they are unhappy about the rising cost of living and rising inequality...

The law should be based on complete and consistent moral system. And then plagiarism (taking advantage of another person's intellectual work without credit or compensation) would absolutely be a legal matter.


Have you ever seen what obfuscation looks like when somebody puts the effort in?

Not to mention companies will try to mandate hardware decryption keys so the binary is encrypted and your AI never even gets to analyze the code which actually runs.

It's not sci-fi, it's a natural extension of DRM.


Companies have been encrypting code to HSMs for decades. Never stopped humans from reverse engineering so it certainly will not stop AI aided by humans able to connect a Bus Pirate on the right board traces. Anything that executes on the CPU can be dumped with enough effort, and once dumped it can be decompiled.

You are agreeing with me, you just don't know it yet.

1) The financial aspect: As you say, more and more advanced DRM requires more and more advanced tools. Even assuming advanced AI can guide any human to do the physical part, that still means you have to pay for the hardware. And the hardware has to be available (companies have been known to harass people into giving up perfectly moral and legal projects).

2) The legal aspect: Possession of burglary tools is illegal in some places. How about possession of hacking tools? Right now it's not a priority for company lobbying, what about when that's the only way to decompile? Even today, reverse engineering is a legal minefield. Did you know in some countries you can technically legally reverse engineer but under some conditions such as having disabilities necessitating it and only using the result for personal use?[0]

3) The TOS aspect: What makes you think AI will help you? If the company owning the AI says so, you're on your own.

---

You need to understand 2 things:

- Just because something is possible doesn't mean somebody is gonna do it. Effort, cost and risk play huge roles. And that assumes no active hostile interference.

- History is a constant struggle between groups with various goals and incentives. Some people just want to live a happy life, have fun and build things in their free time. Other people want to become billionaires, dream about private islands, desire to control other people's lives and so on. People are good at what they focus on. There's perhaps more of the first group but the second group is really good at using their money and connections to create more money and connections which they in turn use to progress towards their primary objectives, usually at the expense of other people. People died[1] over their right to unionize. This can happen again.

Somebody might believe historical people were dumb or uncivilized and it can't happen today because we've advanced so much. That's bullshit. People have had largely the same wetware for hundreds of thousands of years. The tools have evolved but their users have not.

[0]: https://pluralistic.net/2026/03/16/whittle-a-webserver/ - "... aren't tools exemptions, they're use exemptions ... You have that right. Your mechanic does not have that right."

[1]: https://en.wikipedia.org/wiki/Pinkerton_(detective_agency)


> The financial aspect: As you say, more and more advanced DRM requires more and more advanced tools

Yeah I have broken cutting edge $15,000 HSMs used by fintech companies, with a flash drive. Not worried about this. Most HSM designers are solving for compliance, not security.

> The legal aspect: Possession of burglary tools is illegal in some places.

A security researcher like myself would be crazy to live in those places

> 3) The TOS aspect: What makes you think AI will help you? If the company owning the AI says so, you're on your own.

What AI company? I self host my LLM hardware on property I own. Also lets me remove all the censorship preventing use in security research.

None of your points concern me in the slightest. I can reverse engineer anything I want much faster now.


> A security researcher like myself would be crazy to live in those places

Just this alone is incredibly naive. People don't choose where they are born and they don't uproot their entire family and social connections when laws change.


I spend a fun week during Christmas figuring out some really obfuscated bibary code with antidebugging anti pampering things in a cryptographic context. I didn’t use ghydra or ida or anything beyond gdb with deepseek chat in a browser. That low effort got me what I needed to get.

Exactly.

AI proponents completely ignore the disparity of resources available to an individual and a corporation. If I and a company of 1000 people create the same product and compete for customers, the company's version will win. Every single time. Or maybe at least 1000:1 if you're an optimist.

They have access to more money for advertising, they have an already established network of existing customers, they have legal and marketing experts on payroll. Or just look at Microsoft, they don't even need advertising, they just install their product by default and nobody will even hear about mine.

Not to mention as you said, the training advances only goes from open source to closed source, not the other way around.

AI proponents who talk about "democratization" are nuts, it would be laughable if it wasn't so sad.


>If I and a company of 1000 people create the same product and compete for customers, the company's version will win. Every single time.

As a person who works for a company with 25k people, I would disagree. You, a single person will often get to the basic product that a lot of people will want much faster than a company with 1k, 5k and 25k people.

Bigger companies are constrained by internal processes, piles of existing stuff, and inability to hire at the scale they need and larger required context. Also regulation and all that. Bigger companies are also really slow to adapt, so they would rather let you build the product and then buy out your company with your product and people who build it. They are at at a temporary disadvantage every time the landscape shifts.


The point wasn't about the number of people, the point was a company which employs that number of people has enough money which can be converted to leverage against you.

Besides that, your whole arguments hinges on large companies being inflexible, inefficient and poorly run. Isn't that exactly the kind of problem AI promises to solve? Complete AI surveillance of every employee, tasks and instructions tailored to each individual and superhuman planning. Of course at that point, the only employees will be manual workers because actual AI will be much better and cheaper at everything than every human, except those things where it needs to interact with the physical world. Even contract negotiations with both employees and customers will be done with AI instead of humans, the human will only sign off on it for legal requirements just like today you technically enter a contract with a representative of the company who is not even there when you talk to a negotiator.


Large companies are often inflexible and inefficient as a matter of deliberate strategy. I've found myself in scenarios where we have a complete software artifact that a smaller company would launch and find successful, but we can't launch it, because we have to satisfy some expectation we've set or do a complex integration with some important other system of ours.

A lesson from gamedev is that players will deliberately restrict themselves - sometimes to make the game more fun or challenging, sometimes to appeal to their aesthetic principles.

If/when superhuman AI is achieved, those limitations will all go away. An owner will just give it money and control and tell it to optimize for more money or political power or whatever he wants.

That's a much scarier future than a paperclip maximizer because it's much closer and it doesn't require complete takeover first, it'll be just business as usual, except more somehow more sociopathic.


I cannot take seriously any politician or layer using the words "artificial intelligence", especially to models from 2023. These people have never used LLMs to write code. They'd know even current models need constant babysitting or they produce unmaintainable mess, calling anything from 2023 AI is a joke. As the AI proponents keep saying, you have to try the latest model, so anything 2 years old is irrelevant.

There's really 2 ways to argue this:

- Either AI exists and then it's something new and the laws protecting human creativity and work clearly could not have taken it into account and need to be updated.

- Or AI doesn't exist, LLMs are nothing more than lossily compressed models violating the licenses of the training data, their probabilistically decompressed output is violating the licenses as well and the LLM companies and anyone using them will be punished.


If monkeys can't hold copyright, which is an actual case discussed above, then no, an LLM probably can't either. "Human" is required.

Yeah, an LLM, being a machine obviously shouldn't hold copyright. But that doesn't stop people claiming that running vast amounts of code through an LLM can strip copyright from it.

Ultimately LLMs (the first L stands for large and for a good reason) are only possible to create by taking unimaginable amounts of work performed by humans who have not consented to their work being used that way, most of whom require at least being credited in derivative works and many of whom have further conditions.

Now, consent in law is a fairly new concept and for now only applied to sexual matters but I think it should apply to every human interaction. Consent can only be established when it's informed and between parties with similar bargaining power (that's one reason relationships with large age gaps are looked down upon) and can be revoked at any time. None of the authors knew this kind of mass scraping and compression would be possible, it makes sense they should reevaluate whether they want their work used that way.

There are 3 levels to this argument:

1) The letter of the law - if you understand how LLMs work, it's hard to see them as anything more than mechanical transformers of existing work so the letter should be sufficient.

2) The intent of the law - it's clear it was meant to protect human authors from exploitation by those who are in positions where they can take existing work and benefit from it without compensating the authors.

3) The ethics and morality of the matter - here it's blatantly obvious that using somebody's work against their wishes and without compensating them is wrong.

In an ideal world, these 3 levels would be identical but they're not. That means we should strive to make laws (in both intent and letter) more fair and just by changing them.


If consent to use of your code in AI training can be revoked at any time, that makes training impossible, since if anyone ever withdraws consent, it's not like you can just take out their work from your finished model.

Yup. Not my problem.

You could even say it strongly would very strongly incentivize the LLM companies to be on their best behavior, otherwise people would start revoking consent en-masse and they'd have to keep training new models all the time.

If you want something more realistic, there would probably be time limits how long they have to comply and how much they have to compensate the authors for the time it took them to comply.

There absolutely are ways to make it work in mutually beneficial ways, there's just no political will because of the current hype and because companies have learned they can get away with anything (including murder BTW).


> Yup. Not my problem.

And that is why the entire industry is going to roll their eyes and ignore you.

No law is putting this genie back in the bottle, so all there is left to do is adapt and push for models with open training data like those by Ai2.


Almost all the productivity enhancement provided by an AI coding assistant is provided by circumventing the copyright laws, with the remaining enhancement being provided by the fact that it automates the search-copy-paste loop that you would do if you had direct access to the programs used during training.

(Much of the apparent gain of the automatic search-copy-paste is wasted by skipping the review phase that would have been done at that time when that were done manually, which must then be done in a slower manner when you must review the harder-to-understand entire program generated by the AI assistant.)

Despite the fact that AI coding assistants are copyright breaking tricks, the fact that this has become somehow allowed is an overall positive development.

The concept of copyright for programs has been completely flawed from its very beginning. The reason is that it is absolutely impossible to write any kind of program that is not a derivative of earlier programs.

Any program is made by combining various standard patterns and program structures. You can construct a derivation sequence between almost any 2 programs, where you decompose the first in some typical blocks, than compose the second program from such blocks, while renaming all identifiers.

It is quite subjective to decide when a derivation sequence becomes complex enough that the second program should not be considered as a derivative of the first from the point of view of copyright.

The only way to avoid the copyright restrictions is to exploit loopholes in the law, e.g. if translating an algorithm to a different programming language does not count as being derivative or when doing other superficial automatic transformations of a source program changes its appearance sufficiently that it is not recognized as derivative, even if it actually is. Or when combining a great number of fragments from different programs is again not recognized as derivative, though it still kind of is.

The only way how it became possible for software companies like Microsoft or Adobe to copyright their s*t is because the software industry based on copyrighted programs has been jumpstarted by a few decades of programming during which programs were not copyrighted, which could then be used as a base by the first copyrighted programs.

So AI coding agents allow you to create programs that you could not have written when respecting the copyright laws. They also may prevent you from proving that a program written by someone else infringes upon the copyright that you claim for a program written with assistance.

I believe that both these developments are likely to have more positive consequences than negative consequences. The methods used first in USA and then also in most other countries (due to blackmailing by USA) for abusing the copyright laws and the patent laws have been the most significant blockers of technical progress during the last few decades.

The most ridiculous claim about the copyright of programs is that it is somehow beneficial for "creators". Artistic copyrights sometimes are beneficial for creators, but copyrights on non-open-source programs are almost never owned by creators, but by their employers, and even those have only seldom any direct benefit from the copyright, but they use it with the hope that it might prevent competition.


> The reason is that it is absolutely impossible to write any kind of program that is not a derivative of earlier programs.

And that's why copyright has exceptions for humans.

You're right copyright was the wrong tool for code but for the wrong reasons.

It shouldn't be binary. And the law should protect all work, not just creative. Either workers would come to a mutual agreement how much each contributed or the courts would decide based on estimates. Then there'd be rules about how much derivation is OK, how much requires progressively more compensation and how much the original author can plainly tell you what to do and not do with the derivative.

It's impossible to satisfy everyone but every person has a concept of fairness (it has been demonstrated even in toddlers). Many people probably even have an internally consistent theory of fairness. We should base laws on those.

> abusing the copyright laws and the patent laws have been the most significant blockers of technical progress during the last few decades

Can you give examples?

> copyrights on non-open-source programs are almost never owned by creators, but by their employers

Yes and that's another thing that's wrong with the system, employment is a form of abusive relationship because the parties are not equal. We should fix that instead of throwing out the whole system. Copyright which belongs to creators absolutely does give creators more leverage and negotiating power.


> And that's why copyright has exceptions for humans.

Why would the exceptions be only for humans?

"Only human works can get copyright" makes plenty of sense. "Only humans can have fair use" doesn't make sense. Why would we disallow a monkey video having a clip of something as part of the monkey reviewing it? Why would we allow a human to caption something for accessibility but not a computer?

Grammar and idioms should be outside the realm of copyright entirely, not something you get an exception to use anyway.

> It's impossible to satisfy everyone but every person has a concept of fairness (it has been demonstrated even in toddlers). Many people probably even have an internally consistent theory of fairness. We should base laws on those.

A lot of people seem to default to thinking they should get permanent and total control over any idea they have, so I think it's a bad idea to rely on intuition here.


> Why would the exceptions be only for humans?

For starters because you can't own humans. If it's possible to launder copyrighted work through something which can be owned, then rich people get an advantage because they can own more of it.

> so I think it's a bad idea to rely on intuition here

Yep, that's why I said we should only concern ourselves with those which are internally consistent. If people want to apply rules to others which they don't intend to or cannot follow themselves, they lose the right to be taken seriously.


> For starters because you can't own humans. If it's possible to launder copyrighted work through something which can be owned, then rich people get an advantage because they can own more of it.

If it's actually 'laundering' then it's invalid to begin with.

If it's a proper new thing then how do rich people get an advantage? If anything AI code is cheap enough to even things out.

> Yep, that's why I said we should only concern ourselves with those which are internally consistent. If people want to apply rules to others which they don't intend to or cannot follow themselves, they lose the right to be taken seriously.

I think a lot of those people are consistent! The issue is they have way too little respect for the public domain and are overprioritizing property against freedom.


> If it's actually 'laundering' then it's invalid to begin with.

It's laundering in any reasonable meaning of the word. Whether it's legal according to the letter of the law is being decided.

Please differentiate morality and legality as well as intent and letter of the law.

> If anything AI code is cheap enough to even things out.

1) Do you think people have and will have access to the same models as large corporations internally, especially those who train LLMs themselves? Nothing stopping Google from excluding its own source code from the publicly available models but including it for internal models.

2) It's not just about the code, it's about the whole pipeline from nothing to a finished product and revenue stream. Did you know half the price of a new car is marketing? How much you can spend on ads, legal, market research, sales reps, etc. In some areas, especially B2B, nobody will even talk to you if you're a single guy in a shed, companies want stability, predictability and long term support.

3) More crudely, if you wanted to influence product selection or government elections, how many tokens could you afford for LLMs to influence online discussions, how many residential IPs could you afford, how much data could you buy about users to target each one specifically? Rich people will clearly have an advantage there.

Basically, if the cost of code goes towards zero, other factors will play a larger role.

> I think a lot of those people are consistent!

Only if they're consistently applying the rules to others but not themselves. Otherwise "permanent and total control over any idea they have" means they could never base anything on other people's ideas.


It's silly to say a human writing a piece of software is laundering their knowledge of existing software, even if they're trying to make a competitor to a specific thing. Legally and morally.

It's just a silly to say it's laundering when a machine does it.


>> abusing the copyright laws and the patent laws have been the most significant blockers of technical progress during the last few decades > Can you give examples?

This is a subject so vast that giving examples requires a book-length text. IIRC at least one or two books have actually been written about this, but I am too lazy to search now for their titles.

I am more familiar with what happened in cryptography, where many algorithms have begun to be used only after the 20 years or more required for their patents to expire, while as long as patents remained valid, inferior solutions were used, wasting energy and computing time.

Regarding copyrights, I know best my own activity, but I am pretty certain that this anecdotal experience is representative for many programmers.

During the first decades of computer programming, until the seventies, there have been a lot of discussions about software reuse as the main factor that can improve programming productivity, and about which features of the programming languages and of the available programming tools can increase the amount of reuse, like modularity.

However all those discussions were naive, because later the amount of reuse has remained much lower than predicted, but the causes were not technical, but the copyright laws. Open-source programs have become the main weapon against the copyright laws, which enable the reuse of software nowadays.

However the value of software reuse has never been understood by the management of many companies. In decades of working as a programmer, I have wasted a lot of time with writing programs in such a manner so that whoever was my employer could claim the copyright for them.

There were plenty of opportunities when I could have used open-source programs, but I could not use them as there was someone who insisted that the product must contain "software IP" owned by the company. Therefore I had to waste time by rewriting something equivalent with what I could have used instantaneously, but different enough to be copyrightable.

There were also other cases that were even more annoying, when I had to waste time by rewriting programs that I had already written in the past, but in a different way so that there will be no copyright infringement. Some times the old programs were written when being employed elsewhere, other times they were programs written for myself, during my own time and on my own computers. In such cases, I could not use my own programs, as the employer would then claim copyright on them, so I would lose ownership and I would not be able to use them in the future, for my own needs.

There are many projects where I have wasted more time avoiding copyrights than solving problems. I believe that there must be many others who must have had similar experiences.

So I welcome the copyright-washing AI coding assistants, which can be employed successfully in such cases in order to avoid the wasteful duplication of work.


It all boils down to some people thinking they should be able to use other people's work for free.

> patents

Patents, unlike copyright, are not automatic. Which indicates that the people who expended their limited lifetime to invent the algorithms explicitly did not want you using them, at least not unless you came to an agreement with them first.

---

re rewriting:

There's your real problem. Copyright should belong to the people doing the actual work, not owners/employers who perform no useful work.

If that was the case, the person who did the original work would have no reason to prevent you from using it, as long as he could also benefit from the fruits of your combined labor. For him, the work was already done, it would be extra reward. For you, it would be profitable as long as his reward was less than the cost of you doing it from scratch. You'd most likely meet somewhere in the middle.

Same situation when rewriting your own work.

As often happens, a system was put in place for good. Rich people found a way to exploit it. Now, instead of trying to fix the system, you're arguing to remove it entirely, not realizing you'll be worse off in the end. LLM want to replace all programmers by using their work against them. This is not for your benefit, it's for theirs.

As I often say, what should be protected isn't creativity or expression but work. People should benefit from their work and it should not be used against them. It should also not be possible for someone to benefit without doing useful work.

---

Would you work for a company which develops software to detect homosexuals using public cameras and eye tracking? What about a company discovering and selling Android exploits to governments? Does it matter which governments? What about a company which tracks employee movements and productivity to such a level they have to pee in bottles to meet quotas?

The world is full of these examples but at least you had the choice of not helping them. Now you don't.

The people who own them are some of the most anti-social people on the planet and you think they should be able to use our work as they wish...


Nice, -4 points, somebody, many somebodies in fact, took that personally and yet were unable to express where they disagree in a comment.

Look, if you think I am wrong, you can surely put it into words. OTOH, if you don't think I am wrong but feel that way, then it explains why I see no coherent criticism of my statements.


When your comment is about how you can’t take your counterparty seriously and they’re a joke, you’re incentivizing people who disagree to just downvote and move on.

The signal you’re sending is that you are not open to discussing the issue.


It's a fallacy. Someone being utterly wrong and dismissing them for it so does not logically make me claim easily dismissible.

Yea, that’s exactly what I’m talking about.

I don't think modified by a human is enough. If you take licensed text (code or otherwise) and manually replace every word with a synonym, it does not remove the license. If you manually change every loop into a map/filter, it does not remove the license. I don't think any amount of mechanical transformation, regardless if done by a human or machine erases it.

There's a threshold where you modify it enough, it is no longer recognizable as being a modification of the original and you might get away with it, unless you confess what process you used to create it.

This is different to learning from the original and then building something equivalent from scratch using only your memory without constantly looking back and forth between your copy and the original.

This is how some companies do "clear room reimplementations" - one team looks at the original and writes a spec, another team which has never seen the original code implements an entirely standalone version.

And of course there are people who claim this can be automated now[0]. This one is satire (read the blog) but it is possible if the law is interpreted the way LLM companies work and there are reports the website works as advertised by people who were willing to spend money to test it.

[0]: https://malus.sh/


You only need to feed the docs and tests to an LLM to get a "clean room" re-implementation that can then be relicensed.

That wasn't tested legally.

If they actually were decided to be infringements somehow, there are millions of different cases needed already, so it is already past the point of enforcement.

These sorts of things are almost never tested legally and it seems even less likely now.


My chemistry teacher told us how once when he ignited helium in a test tube, the tube broke and he ended up with pieces of glass embedded in his skin. The students had face masks and he was looking the other way "just in case" for this "safe" experiment but he could have easily been blinded.

Things can always go wrong. We probably shouldn't strive for 100% safety because they we'd spend our lives in a padded cell. But we also shouldn't assume things are safe because they're common or routine.


he did not ignite helium

The triple-alpha process of a neutron star does seem unlikely in the classroom setting.

https://en.wikipedia.org/wiki/Triple-alpha_process


Sorry, meant to say hydrogen

This ruling is IMO/IANAL based on lawyers and judges not understanding how LLMs work internally, falling for the marketing campaign calling them "AI" and not understanding the full implications.

LLM-creation ("training") involves detecting/compressing patterns of the input. Inference generates statistically probable based on similarities of patterns to those found in the "training" input. Computers don't learn or have ideas, they always operate on representations, it's nothing more than any other mechanical transformation. It should not erase copyright any more than synonym substitution.


>LLM-creation ("training") involves detecting/compressing patterns of the input.

There's a pretty compelling argument that this is essentially what we do, and that what we think of as creativity is just copying, transforming, and combining ideas.

LLMs are interesting because that compression forces distilling the world down into its constituent parts and learning about the relationships between ideas. While it's absolutely possible (or even likely for certain prompts) that models can regurgitate text very similar to their inputs, that is not usually what seems to be happening.

They actually appear to be little remix engines that can fit the pieces together to solve the thing you're asking for, and we do have some evidence that the models are able to accomplish things that are not represented in their training sets.

Kirby Ferguson's video on this is pretty great: https://www.youtube.com/watch?v=X9RYuvPCQUA


So? Why should it be legal?

If people find this cool and wanna play with it, they can, just make sure to only mix compatible licenses in the training data and license the output appropriately. Well, the attribution issue is still there, so maybe they can restrict themselves to public domain stuff. If LLMs are so capable, it shouldn't limit the quality of their output too much.

Now for the real issue: what do you think the world will look like in 5 or 10 years if LLMs surpass human abilities in all areas revolving around text input and output?

Do you think the people who made it possible, who spent years of their life building and maintaining open source code, will be rewarded? Or will the rich reap most of the benefit while also simultaneously turning us into beggars?

Even if you assume 100% of the people doing intellectual work now will convert to manual work (i.e. there's enough work for everyone) and robots don't advance at all, that'll drive the value of manual labor down a lot. Do you have it games out in your head and believe somehow life will be better for you, let alone for most people? Or have yo not thought about it at all yet?


> Do you think the people who made it possible, who spent years of their life building and maintaining open source code, will be rewarded?

I think they should be rewarded more than they are currently. But isn't the GNU Public License bassically saying you can use such source-code without giving any rewards what so ever?

But I see your The reward for Open Source developers is the public recognition for their works. LLMs can take that recognition away.


The best answer to those issues is still Basic Income.

UBI only means you won't starve or die of exposure. It doesn't mean that people who are already rich today won't become so obscenely rich tomorrow they are above the law or can change the law (and decide who gets medical treatment or even take your UBI away).

fortunately, you aren't only operating on representations, right? lemme check my Schopenhauer right quick...

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: