Hacker Newsnew | past | comments | ask | show | jobs | submit | aszen's commentslogin

Claude code already fans out and sandboxes context by calling sub agents so I'm not sure this approach brings much benefit there. A complex search strategy only makes sense if the search is slow and compute intensive.


i find claude subagents are incredibly slow and the vast majority of that is inference time between tool calls


Coding agents prefer to do iterative search, I have yet to see them create a complex search script. They try different search cmds in parallel, evaluate their results and then refine or dive deeper.

This approach usually works great but I can see many use cases where a smarter search strategy may make sense especially to optimize context.


Wasn’t there work being done where a model could remove irrelevant context? Maybe these complex search scripts can help the model get the revenant files and then it can remove it.


By slowing down engineers with ai agents adding multiple code reviews on top. Also encouraging engineers to engage in manual testing themselves to better understand the product.


Claude code is not infra, the model is the infra. They changed settings to make their models faster and probably cheaper to run too. Honestly with adaptive thinking it no longer matters what model it is if you can dynamically make it do less or more work.


Same here. Reviewing gets harder too and multi tasking kills any kind of productivity if you need to review the code then.

My approach these days is to do one change at a time, until I can fully merge it with confidence.


This is quite interesting, will try it. I kind of expect this to be done continuously as the code base changes.


This article doesn't mention the moat of data gathering, frontier AI labs have a huge advantage in curating proprietary datasets from actual usage of their platforms.

This in turn allows them to optimize their models for the long tail of tasks that open weight models can't compete with.

Another factor is that pure intelligence isn't enough, how the model communicates is a huge plus. An enterprise used to talking to Claude all day won't be easy to switch to another model


Seems like you are testing llms genric abilities rather than your actual agent logic.

Llms are like vendor code you don't need to test them yourself people already created benchmarks for that.


No they haven’t. The benchmarks suck, because they are cheap knockoffs instead of comprehensive experiments.

LLMs are poorly tested by vendors. They literally can’t afford to test them, so they force us to do it.


If you buy real handcrafted scarves they are both thinner and warmer than anything factory made bcz of their choice of pashmina wool.


So the new implementation always operates at the line level, replacing one or more lines. That's not ideal for some refactorings like rename where search and replace is faster.

Edit

Checking ohmypi The model has access to str replace too so this is just a edit till


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: