Hacker Newsnew | past | comments | ask | show | jobs | submit | mnicky's commentslogin

> things that used to be cached for 1 hour now only being cached for 5 min.

Doesn't this only apply to subagents, which don't have much long-time context anyway?


AFAIK the way caching works is at API key level, which will be shared across the main/parent agent and all subagents.

Note that the model API is stateless - there is no connection being held open for the lifetime of any agent/subagent, so the model has no idea how long any client-side entity is running for. All the model sees over time is a bunch of requests (coming from mixture of parent and subagents) all using the same API key, and therefore eligible to use any of the cached prompt prefixes being maintained for that API key.

Things like subagent tool registration are going to remain the same across all invocations of the subagent, so those would come from cache as long as the cache TTL is long enough.


Maybe this? From the article:

> Opus 4.7 is substantially better at following instructions. Interestingly, this means that prompts written for earlier models can sometimes now produce unexpected results: where previous models interpreted instructions loosely or skipped parts entirely, Opus 4.7 takes the instructions literally. Users should re-tune their prompts and harnesses accordingly.


Possible, but very unlikely.

One of the hard rules in my harness is that it has to provide a summary Before performing a specific action. There is zero ambiguity in that rule. It is terse, and it is specific.

In the last 4 sessions (of 4 total), it has tried skipping that step, and every time it was pointed out, it gave something like the following.

> You're right — I skipped the summary. Here it is.

It is not following instructions literally. I wish it was. It is objectively worse.


Using hooks can help.

Not sure it is better at following instructions. One of the first issues I had with it was doing the thing it was specifically forbidden from doing. When told: "oh sorry, I had a note that I should not do it in my MEMORY but I did it anyway".

Also, what is $20,000 today can be $2000 next year. Or $20...

See e.g. https://epoch.ai/data-insights/llm-inference-price-trends/


Or $200,000 for consumers when they have to make a profit

Good point. This is why consumer phones have got much worse since 2005 and now cost millions of dollars.

Now do uber rides

With consumer phones you're not telling your customers "spend $200,000 with us to try and find holes before the bad guys do it". Commercial SAST tools have been around for 20 years and the pricing hasn't moved in all that time. With AI tools you've got a combination of the perfect hostage situation, pay for our stuff before others will find bad things about your product, and a desperate need to create the illusion of some sort of revenue stream, so I doubt prices will be dropping any time soon.

If I want to buy today a smartphone that is positioned on the market at the same level as what I was buying for around $500 seven-eight years ago, now I have to spend well over $1000, a price increase between 2 and 3 times.

So your example is not well chosen.

Price increases have affected during the last decade many computing and electronics devices, though for most of them the price increases have been less than for smartphones.


If you want the level of storage, screen resolution and camera quality as a $500 phone from 8 years ago, you can get that for $250 today.

Of course their marketing team tries to convince you to spend more money. That doesn't mean you have to.


With the way the chip shortage the way it is, I'm a little concerned that my next phone will be worse and more expensive...

Yeah and to give a more recent example, it's exactly like how RAM, storage, and other computer parts have gotten much cheaper over the last 3 years... oh wait.

This might be unconstitutional?


Averages tell nothing about an average citizen.

Also, there are other measurements like inequality, healthcare cost, social securities...


Averages tell us the general availability of wealth. To give you some perspective, most European countries are poorer than the poorest U.S. state, which is Mississippi. There just aren't as many high-paying jobs. And these figures encompass everything, including what the government spends on health care and welfare programs and education. So Europeans as a whole get much less per capita from both government and private sector. When it comes to purchasing power parity, which takes into account cost-of-living differences, the gap isn't as big as above, but it's still pretty significant.


Avetage tells nothing you should use median or a graph.


Median data is better, yes, but it's harder to get. Average correlates quite strongly with median.


This observation makes sense, because all models currently probably use some kind of a sparse attention architecture.

So the closer the two related pieces of information are to each other in the input context, the larger the chance their relationship will be preserved.


He's trying to make it sound so, but in legal domain, devil lies in the details.

It seems that government wanted to use Claude for mass analysis of commercially obtained data on American people and Anthropic wouldn't let them (source: https://www.theatlantic.com/technology/2026/03/inside-anthro... ).

DoD kept asking for changes of contract where at least the legalese would be changed to somewhat more permissive but Anthropic stayed their ground.

Sam Altman probably let them do that, while using language like "we have technical means of oversight and the same red lines as Anthropic". But in reality they will allow DoD to do what Anthropic didn't.

See this for more information: https://www.lesswrong.com/posts/PBrggrw4mhgbksoYY/a-tale-of-...


> Very often, after a correction, it will focus a lot on the correction itself making for weird-sounding/confusing statements in commit messages and comments.

I've experienced that too. Usually when I request correction, I add something like "Include only production level comments, (not changes)". Recently I also added special instruction for this to CLAUDE.md.


Since some time, Claude Codes's plan mode also writes file with a plan that you could probably edit etc. It's located in ~/.claude/plans/ for me. Actually, there's whole history of plans there.

I sometimes reference some of them to build context, e.g. after few unsuccessful tries to implement something, so that Claude doesn't try the same thing again.


Can you compare it to Opus 4.6 with thinking disabled? It seems to have very impressive benchmark scores. Could also be pretty fast.


Added a thinking-disabled Opus 4.6 timing. It took 1m 4s – coincidentally the same as 5.3-codex-low.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: