I can appreciate the effort put into the goal of optimization shared in the post...

kelnos · 2025-09-06T21:14:40 1757193280

Agreed. When you have multiple developers working on the same code, you end up with overlapping test coverage as time goes on. You also end up with test coverage that was initially written with good intentions, but ultimately you'll later find that some of it just isn't necessary for confidence, or isn't even testing what you think it is.

Teams need to periodically audit their tests, figure out what covers what, figure out what coverage is actually useful, and prune stuff that is duplicative and/or not useful.

OP says that ultimately their costs went down: even though using Claude to make these determinations is not cheap, they're saving more than they're paying Claude by running fewer tests (they run tests on a mobile device test farm, and I expect that can get pricey). But ultimately they might be able to save even more money by ditching Claude and deleting tests, or modifying tests to reduce their scope and runtime.

And at this point in the sophistication level of LLMs, I would feel safer about not having an LLM deciding which tests actually need to run to ensure a PR is safe to merge. I know OP says that so far they believe it's doing the right thing, but a) they mention their methodology for verifying this in a comment here[0], and I don't agree that it's a sound methodology[1], and b) LLMs are not deterministic and repeatable, so it could choose two very different sets of tests if run twice against the exact same PR. The risk of that happening may be acceptable, though; that's for each individual to decide.

[0] https://news.ycombinator.com/item?id=45152504

[1] https://news.ycombinator.com/item?id=45152668