Hacker Newsnew | past | comments | ask | show | jobs | submit | EMM_386's commentslogin

This is getting really out of control at the moment and I'm not exactly sure what the best way to fix it is, but this is a very good post in terms of expressing the why this is not acceptable and why the burden if shifting on the wrong people.

Will humans take this to heart and actually do the right thing? Sadly, probably not.

One of the main issues is that pointing to your GitHub contributions and activity is now part of the hiring process. So people will continue to try to game the system by using LLMs to automate that whole process.

"I have contributed to X, Y, and Z projects" - when they actually have little to no understanding of those projects or exactly how their PR works. It was (somehow) accepted and that's that.


I see the problem everyday and am just playing devil's advocate but it doesn't really do a good job explaining the "why".

They hint at Django being a different level of quality compared to other software, wanting to cultivate community, and go slowly.

It doesn't explain why LLM usage reduces quality or they can't have a strong community with LLM contributions.

The problem is that good developers using LLM is not a problem. They review the code, they implement best practices, they understand the problems and solutions. The problem is bad developers contributing - just as it always has been. The problem is that LLMs enable bad developers to contribute more - thus an influx of crap contributions.


The last section focuses on how to use LLMs to make contributions:

> Use an LLM to develop your comprehension.

I really like that, because it gets past the simpler version that we usually see, "You need to understand your PR." It's basically saying you need to understand the PR you're making, and the context of that PR within the wider project.


I think they explain "why" very clearly. They say the problem is people who don't understand their own contributions.

A decade or more of people copy-pasting rote solutions from StackOverflow only supports the notion that many people will forego comprehension to foster the illusion of competent productivity.

This ain't an AI problem, it's a people problem that's getting amplified by AI.


It was interesting the other day tracing the lineage of Aaron Swartz -> Library Genesis / Sci-Hub -> LLM vendors relying on that work to train their models and sell it back to us all with no royalties or accountability to the original authors of all this painstakingly researched, developed, and recorded human knowledge they’re making billions on.

they are not making billions.. they are burning billions.

That's true. Except that you can have Agents doing it 24/7 with no human input. The amount of repos/PRs is only limited by GPUs

Hey, thank you for the compliment!

> Will humans take this to heart and actually do the right thing? Sadly, probably not.

I would like to think that individuals who are interested in joining an OSS community will.


> One of the main issues is that pointing to your GitHub contributions and activity is now part of the hiring process.

If I were hiring at this moment, I'd look at the ratio of accepted to rejected PRs from any potential candidate. As an open source maintainer, I look at the GitHub account that's opening a PR. If they've made a long string of identical PRs across a wide swath of unrelated repos, and most of those are being rejected, that's a strong indicator of slop.

Hopefully there will be a swing back towards quality contributions being the real signal, not just volume of contributions.


Your ratio idea presumes a lot about the maintainers or the nature of the disagreements. I recently sent a handwritten PR to fix a bug in a well-respected project, which involved switching from API A to B. The maintainer was uncomfortable with using B (although I had tested it) and suggested that I call A in a loop, which seemed more dangerous to me. In the end my PR was closed and the bug is still somewhat unresolved.

Should that affect our hiring? In an ideal world, no. He had his opinion and I have mine, and I do reflect that I should've asked if I could've added integration testing to assuage his fears regarding B.

The real problem is the fact that we as an industry have celebrated using casual volunteer work as a hiring indicator and devalued our own labor to a degree unseen anywhere else. The GitHub activity grid turned us all into cattle and should be seen as a paramount violation of ethics amongst the invention of leaded gas and the VW emissions scandal.


I now want to create a public index of “slop” contributors. People need to know their “heroes”.

> Will humans take this to heart and actually do the right thing? Sadly, probably not.

Don’t blame the people, blame the system.

Identifying the problem is just the first step. Building consensus and finding pragmatic solutions is hard. In my opinion, a lot of technical people struggle with the second sentence. So much of the ethos in our community is “I see a problem, and I can fix it on my own by building [X].” I think people are starting to realize this doesn’t scale. (Applying the scaling metaphor to people problems might itself be a blindspot.)


You can blame both! The people are definitely not helping.

What kinds of actionable plans ever result from blaming people (as a category)? Where will it get you? Expecting some people to behave differently... just "because"? What kinds of plans flow downstream from blaming human nature? What's the plan? Does it help you somehow, practically? Or is it mostly about feeling better somehow?

If the plan is persuasion, putting blame aside goes a long way.

If you want to make change based in the real world, you could do worse that reading and absorbing "Thinking in Systems: A Primer" by Donella Meadows.


Obviously the solution is better AI PR reviewers with more context for FOSS projects /s

And I’m 100% sure there are dozens of startups working on that exact problem right this second.


> But you REALLY need to know your stuff to begin with for they to be of any use. Those who think they will take over are clueless.

Or - there are enough people who know their stuff that the people who don't will be replaced and they will take over anyway.


> there are enough people who know their stuff

unless the bar for "know their stuff" is very very low - this is not the case in the nearest future


> It cannot, however, synthesize new facts by combining information from this corpus.

That would be like saying studying mathematics can't lead to someone discovering new things in mathematics.

Nothing would ever be "novel" if studying the existing knowledge could not lead to novel solutions.

GPT 5.2 Thinking is solving Erdős Problems that had no prior solution - with a proof.


The Erdos problem was solved by interacting with a formal proof tool, and the problem was trivial. I also don't recall if this was the problem someone had already solved prior but not reported, but that does not matter.

The point is that the LLM did not model maths to do this, made calls to a formal proof tool that did model maths, and was essentially working as the step function to a search algorithm, iterating until it found the zero in the function.

That's clever use of the LLM as a component in a search algorithm, but the secret sauce here is not the LLM but the middleware that operated both the LLM and the formal proof tool.

That middleware was the search tool that a human used to find the solution.

This is not the same as a synthesis of information from the corpus of text.


> An automated way to achieve this would be awesome.

The author can easily do this by creating a simple memory tool call, announcing it in the prompt to the LLM, and having it call the tool.

I wrote an agent harness for my own use that allows add/remove memories and the AI uses it as you would expect - to keep notes for itself between sessions.


What is wrong with "claude --chrome"?


the Claude --chrome command has a few limitations:

1. it exposes low-level tools which make your agent interact directly with the browser which is extremely slow, VERY expensive, and less effective as the agent ends up dealing with UI mechanics instead of thinking about the higher-level goal/intents

2. it makes Claude operate the browser via screenshots and coordinates-based interaction, which does not work for tasks like data extraction where it needs to be able to attend to the whole page - the agent needs to repeatedly scroll and read one little screenshot at the time and it often misses critical context outside of the viewport. It also makes the task more difficult as the model has to figure out both what to do and how to do it, which means that you need to use larger models to make this paradigm actually work

3. because it uses your local browser, it also means that it has full access to your authenticated accounts by default which might not be ideal in a world where prompt-injections are only getting started

if you actively use the --chrome command we'd love to hear your experience!


I am sure they measured the difference but i am wondering why reading screenshots + coordinates is more efficient than selecting aria labels? https://github.com/Mic92/mics-skills/blob/main/skills/browse.... the JavaScript snippets should at least more reusable if you want semi-automate websites with memory files


claude --chrome works, but as the OP mentions: they can do it 20x faster, by passing in higher-level commands.


You can do this.

At least to a level that gets you way past HTTP Bearer Token Authentication where the humans are upvoting and shilling crypto with no AI in sight (like on Moltbook at the moment).


Claude generated the statements to run against Supabase and the person getting the statements from Claude sent it to the person who vibe-coded Moltbook.

I wish I was kidding but not really - they posted about it on X.


Claude is very good at writing SQL. You still need to review and understand it.

I recently started a new Supabase project and used Claude to write all migrations related to RLS and RBAC.


What are we discussing here?

The tools or the models? It's getting absurdly confusing.

"Claude Code" is an interface to Claude, Cursor is an IDE (I think?! VS Code fork?), GitHub Copilot is a CLI or VS Code plugin to use with ... Claude, or GPT models, or ...

If they are using "Claude Code" that means they are using Anthropic's models - which is interesting given their huge investment in OpenAI.

But this is getting silly. People think "CoPilot" is "Microsoft's AI" which it isn't. They have OpenAI on Azure. Does Microsoft even have a fine-tuned GPT model or are they just prompting an OpenAI model for their Windows-builtins?

When you say you use CoPilot with Claude Opus people get confused. But this is what I do everyday at work.

shrug


This is great.

When I work with AI on large, tricky code bases I try to do a collaboration where it hands off things to me that may result in large number of tokens (excess tool calls, unprecise searches, verbose output, reading large files without a range specified, etc.).

This will help narrow down exactly which to still handle manually to best keep within token budgets.

Note: "yourusername" in install git clone instructions should be replaced.


I've been trying to get token usage down by instructing Claude to stop being so verbose (saying what it's going to do beforehand, saying what it just did, spitting out pointless file trees) but it ignores my instructions. It could be that the model is just hard to steer away from doing that... or Anthropic want it to waste tokens so you burn through your usage quickly.


Simply assert that :

you are a professional (insert concise occupation).

Be terse.

Skip the summary.

Give me the nitty-gritty details.

You can send all that using your AI client settings.


I had a similar problem, and when claude code (or codex) is running in sandbox, i wanted to put a cap or get notified on large contexts.

especially, because once x0K words crossed, the output becomes worser.

https://github.com/quilrai/LLMWatcher

made this mac app for the same purpose. any thoughts would be appreciated


Would you mind sharing more details about how you do this? What do you add to your AI prompts to make it hand those tasks off to you?


Hahahah just fixed it, thank you so much!!!! Think of extending this to a prompt admin, Im sure there is a lot of trash that the system sends on every query, I think we can improve this.


The flickering issue due to the Ink library has been a headache for a long time, but they are slowly making progress on this.

https://github.com/anthropics/claude-code/issues/769


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: