More

mnicky · 2026-04-16T20:12:37 1776370357

> things that used to be cached for 1 hour now only being cached for 5 min.

Doesn't this only apply to subagents, which don't have much long-time context anyway?

HarHarVeryFunny · 2026-04-17T11:54:55 1776426895

AFAIK the way caching works is at API key level, which will be shared across the main/parent agent and all subagents.

Note that the model API is stateless - there is no connection being held open for the lifetime of any agent/subagent, so the model has no idea how long any client-side entity is running for. All the model sees over time is a bunch of requests (coming from mixture of parent and subagents) all using the same API key, and therefore eligible to use any of the cached prompt prefixes being maintained for that API key.

Things like subagent tool registration are going to remain the same across all invocations of the subagent, so those would come from cache as long as the cache TTL is long enough.

mnicky · 2026-04-16T16:57:01 1776358621

Maybe this? From the article:

> Opus 4.7 is substantially better at following instructions. Interestingly, this means that prompts written for earlier models can sometimes now produce unexpected results: where previous models interpreted instructions loosely or skipped parts entirely, Opus 4.7 takes the instructions literally. Users should re-tune their prompts and harnesses accordingly.

bushido · 2026-04-16T18:42:46 1776364966

Possible, but very unlikely.

One of the hard rules in my harness is that it has to provide a summary Before performing a specific action. There is zero ambiguity in that rule. It is terse, and it is specific.

In the last 4 sessions (of 4 total), it has tried skipping that step, and every time it was pointed out, it gave something like the following.

> You're right — I skipped the summary. Here it is.

It is not following instructions literally. I wish it was. It is objectively worse.

chickensong · 2026-04-17T06:51:06 1776408666

Using hooks can help.

rimliu · 2026-04-17T13:33:48 1776432828

Not sure it is better at following instructions. One of the first issues I had with it was doing the thing it was specifically forbidden from doing. When told: "oh sorry, I had a note that I should not do it in my MEMORY but I did it anyway".

mnicky · 2026-04-11T20:20:14 1775938814

Also, what is $20,000 today can be $2000 next year. Or $20...

See e.g. https://epoch.ai/data-insights/llm-inference-price-trends/

sumeno · 2026-04-11T20:40:55 1775940055

Or $200,000 for consumers when they have to make a profit

philipallstar · 2026-04-11T22:33:24 1775946804

Good point. This is why consumer phones have got much worse since 2005 and now cost millions of dollars.

thmoonbus · 2026-04-11T23:24:16 1775949856

Now do uber rides

pseudohadamard · 2026-04-12T09:38:22 1775986702

With consumer phones you're not telling your customers "spend $200,000 with us to try and find holes before the bad guys do it". Commercial SAST tools have been around for 20 years and the pricing hasn't moved in all that time. With AI tools you've got a combination of the perfect hostage situation, pay for our stuff before others will find bad things about your product, and a desperate need to create the illusion of some sort of revenue stream, so I doubt prices will be dropping any time soon.

adrian_b · 2026-04-12T12:30:36 1775997036

If I want to buy today a smartphone that is positioned on the market at the same level as what I was buying for around $500 seven-eight years ago, now I have to spend well over $1000, a price increase between 2 and 3 times.

So your example is not well chosen.

Price increases have affected during the last decade many computing and electronics devices, though for most of them the price increases have been less than for smartphones.

snovv_crash · 2026-04-12T13:05:46 1775999146

If you want the level of storage, screen resolution and camera quality as a $500 phone from 8 years ago, you can get that for $250 today.

Of course their marketing team tries to convince you to spend more money. That doesn't mean you have to.

ijk · 2026-04-11T23:14:47 1775949287

With the way the chip shortage the way it is, I'm a little concerned that my next phone will be worse and more expensive...

xmprt · 2026-04-12T02:19:27 1775960367

Yeah and to give a more recent example, it's exactly like how RAM, storage, and other computer parts have gotten much cheaper over the last 3 years... oh wait.

mnicky · 2026-04-04T23:02:33 1775343753

This might be unconstitutional?

mnicky · 2026-04-04T22:46:56 1775342816

Averages tell nothing about an average citizen.

Also, there are other measurements like inequality, healthcare cost, social securities...

ETH_start · 2026-04-05T03:56:28 1775361388

Averages tell us the general availability of wealth. To give you some perspective, most European countries are poorer than the poorest U.S. state, which is Mississippi. There just aren't as many high-paying jobs. And these figures encompass everything, including what the government spends on health care and welfare programs and education. So Europeans as a whole get much less per capita from both government and private sector. When it comes to purchasing power parity, which takes into account cost-of-living differences, the gap isn't as big as above, but it's still pretty significant.

rvba · 2026-04-05T18:20:42 1775413242

Avetage tells nothing you should use median or a graph.

ETH_start · 2026-04-08T04:08:51 1775621331

Median data is better, yes, but it's harder to get. Average correlates quite strongly with median.

mnicky · 2026-03-06T08:40:54 1772786454

This observation makes sense, because all models currently probably use some kind of a sparse attention architecture.

So the closer the two related pieces of information are to each other in the input context, the larger the chance their relationship will be preserved.

mnicky · 2026-03-04T07:09:21 1772608161

He's trying to make it sound so, but in legal domain, devil lies in the details.

It seems that government wanted to use Claude for mass analysis of commercially obtained data on American people and Anthropic wouldn't let them (source: https://www.theatlantic.com/technology/2026/03/inside-anthro... ).

DoD kept asking for changes of contract where at least the legalese would be changed to somewhat more permissive but Anthropic stayed their ground.

Sam Altman probably let them do that, while using language like "we have technical means of oversight and the same red lines as Anthropic". But in reality they will allow DoD to do what Anthropic didn't.

See this for more information: https://www.lesswrong.com/posts/PBrggrw4mhgbksoYY/a-tale-of-...

mnicky · 2026-02-22T15:58:10 1771775890

> Very often, after a correction, it will focus a lot on the correction itself making for weird-sounding/confusing statements in commit messages and comments.

I've experienced that too. Usually when I request correction, I add something like "Include only production level comments, (not changes)". Recently I also added special instruction for this to CLAUDE.md.

mnicky · 2026-02-22T15:53:06 1771775586

Since some time, Claude Codes's plan mode also writes file with a plan that you could probably edit etc. It's located in ~/.claude/plans/ for me. Actually, there's whole history of plans there.

I sometimes reference some of them to build context, e.g. after few unsuccessful tries to implement something, so that Claude doesn't try the same thing again.

mnicky · 2026-02-12T20:29:41 1770928181

Can you compare it to Opus 4.6 with thinking disabled? It seems to have very impressive benchmark scores. Could also be pretty fast.

postalcoder · 2026-02-12T21:00:40 1770930040

Added a thinking-disabled Opus 4.6 timing. It took 1m 4s – coincidentally the same as 5.3-codex-low.