Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is it annoying that I tell it to do something and it does about a third of it? Absolutely.

Can I get it to finish by asking it over and over to code review its PR or some other such generic prompt to weed out the skips and scaffolding? Also yes.

Basically these things just need a supervisor looking at the requirements, test results, and evaluating the code in a loop. Sometimes that's a human, it can also absolutely be an LLM. Having a second LLM with limited context asking questions to the worker LLM works. Moreso when the outer loop has code driving it and not just a prompt.





I guess this is another example - I literally have not experienced what you described in... several weeks, at least.

I often ask for big things.

For example I'm working on some virtualization things where I want a machine to be provisioned with a few options of linux distros and BSDs. In one prompt I asked for this list to be provisioned so a certain test of ssh would complete, it worked on it for several hours and now we're doing the code review loop. At first it gave up on the BSDs and I had to poke it to actually finish with an idea it had already had, now I'm asking it to find bugs and it's highlighting many mediocre code decisions it has made. I haven't even tested it so I'm not sure if it's lying about anything working yet.


I usually talk with the agent back and forth for 15 min, explicitly ask, "what corner cases do we need to consider, what blind spots do I have?" And then when I feel like I've brain vomited everything + send some non-sensitive copy and paste and ask it for a CLAUDE/AGENTS.md and that's sufficient to one-shot 98% of cases

Yeah I usually ask what open questions it has, versus when it thinks it is ready to implement.

The thing I've learned is that it doesn't do well at the big things (yet).

I have to break large tasks into smaller tasks, and limit the context and scope.

This is the thing that both Superpowers and Ralph [0] do well when they're orchestrating; the plans are broken down enough so that the actual coding agent instance doesn't get overwhelmed and lost.

It'll be interesting to see what Claude Code's new 1m token limit does to this. I'm not sure if the "stupid zone" is due to approaching token limits, or to inherent growth in complexity in the context.

[0] these are the two that I've experimented with, there are others.


It's like a little kid, you tell it to do the dishes and it does half of them and then runs away.

ah, so cool. Yeah that is definitely bigger than what I ask for. I'd say the bigger risk I'm dealing with right now is that while it passes all my very strict linting and static analysis toolsets, I neglected to put detailed layered-architecture guidelines in place, so my code files are approaching several hundred lines now. I don't actually know if the "most efficient file size" for an agent is the same as for a human, but I'd like them to be shorter so I can understand them more easily.

Tell it to analyze your codebase for best practices and suggest fixes.

Tell it to analyze your architecture, security, documentation, etc. etc. etc. Install claude to do review on github pull requests and prompt it to review each one with all of these things.

Just keep expanding your imagination about what you can ask it to do, think of it more like designing an organization and pinning down the important things and providing code review and guard rails where it needs it and letting it work where it doesn't.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: