More

KerrickStaley · 2026-04-17T21:01:20 1776459680

> At the time of writing, the fix has not yet reached stable releases.

Why was this disclosed before the hole was patched in the stable release?

It's only been 18 days since the bug was reported to upstream, which is much shorter than typical vulnerability disclosure deadlines. The upstream commit (https://github.com/gnachman/iTerm2/commit/a9e745993c2e2cbb30...) has way less information than this blog post, so I think releasing this blog post now materially increases the chance that this will be exploited in the wild.

Update: The author was able to develop an exploit by prompting an LLM with just the upstream commit, but I still think this blog post raises the visibility of the vulnerability.

winstonwinston · 2026-04-18T05:21:44 1776489704

There exist some disclosure embargo exceptions when you believe the vulnerability is being used in wild or when the vulnerability fix is already released publicly (such as git commit), which makes it possible to produce exploit quickly. In this case it is preferred by the community to publish vulnerability.

bawolff · 2026-04-18T00:34:00 1776472440

Once the commit is public, the cat is out of the bag. Being coy about it only helps attackers and reduces everyone's security.

6thbit · 2026-04-18T15:24:24 1776525864

Yes I think this is an appropriate view today.

My only caveat would be that in some security fixes, the pure code delta, is not always indicative of the full exploit method. But LLMs could interpolate from there depending on context.

bawolff · 2026-04-18T22:47:49 1776552469

It is just as much the appropriate view now as it was in the 90s.

Attackers are not idiots. Once you have the commit, it is usually pretty easy to figure out, even just having the binary diff is usually enough.

6thbit · 2026-04-18T23:48:21 1776556101

The binary diff?

bawolff · 2026-04-19T06:59:54 1776581994

There are people who reverse engineer security vulns of closed source products by comparing the before and after of the compiled binary.

cryptbe · 2026-04-18T15:38:03 1776526683

Disclosure: I didn't discover the vulnerability. I wrote the blog post.

>The author was able to develop an exploit by prompting an LLM with just the upstream commit

Yes, I was able to do this. I believe anyone watching iTerm2's commits would be able to do this too.

>but I still think this blog post raises the visibility of the vulnerability.

Yes, I wanted to raise the visibility of the vulnerability, and it works!

The author of iTerm2 initially didn’t consider it severe enough to warrant an immediate release, but they now seem to have reconsidered.

staticassertion · 2026-04-18T16:46:20 1776530780

> The author of iTerm2 initially didn’t consider it severe enough to warrant an immediate release, but they now seem to have reconsidered.

It's funny that we still have the same conversation about disclosure timelines. 18 days is plenty of time, the commit log is out there, etc.

The whole "responsible disclosure" thing is in response to people just publishing 0days, which itself was a response to vendors threatening researchers when vulns were directly reported.

ezoe · 2026-04-17T21:42:09 1776462129

I guess traditional moratorium period for vulnerability publication is going to be fade away as we rely on AI to find it.

If publicly accessible AI model with very cheap fee can find it, it's very natural to assume the attackers had found it already by the same method.

saddist0 · 2026-04-17T22:27:20 1776464840

It’s a wrong way to look at things. Just because CIA can know your location (if they want to), would you share live location to everyone on the internet?

LLM is a tool, but people still need to know — what where how.

lxgr · 2026-04-17T22:54:53 1776466493

Not sure if that's a great example. If there's a catastrophic vulnerability in a widely used tool, I'd sure like to know about it even if the patch is taking some time!

The problem with this is that the credible information "there's a bug in widely used tool x" will soon (if not already) be enough to trigger massive token expenditure of various others that will then also discover the bug, so this will often effectively amount to disclosure.

I guess the only winning move is to also start using AI to rapidly fix the bugs and have fast release cycles... Which of course has a host of other problems.

integralid · 2026-04-18T01:20:28 1776475228

>there's a bug in widely used tool x"

There's a security bug in Openssh. I don't know what it is, but I can tell you with statistical certainty that it exists.

Go on and do with this information whatever you want.

mmilunic · 2026-04-18T02:52:55 1776480775

I think in the context of these it’s more of “we’ve discovered a bug” which gives you more information than “there is a bug”. The main difference in information being that the former implies not only there is a bug but that LLMs can find it.

lxgr · 2026-04-18T10:19:53 1776507593

If you're a random person on the Internet, I can indeed not do much with that information.

But if you're a security research lab that a competing lab can ballpark the funding of and the amount of projects they're working on (based on industry comparisons, past publications etc.), I think that can be a signal.

mx7zysuj4xew · 2026-04-18T00:05:25 1776470725

Wrong argument, since it's not just available to "the CIA" but every rando under the sun, people should be notified immediately if "tracking" them is possible and mitigation measures should become a common standard practice

jerf · 2026-04-18T16:03:26 1776528206

You and I would need to know "what where how".

There are many attackers that are just going to feed every commit of every project of interest to them into their LLMs and tell it "determine if this is patching an exploit and if so write the exploit". They don't need targeting clues. They're already watching everything coming out of

Do not make the mistake of modeling the attackers as "some guy in a basement with a laptop who decided just today to start attacking things". There are nation-state attackers. There are other attackers less funded than that but who still may not particularly blink at the plan I described above. Putting out the commit was sufficient to tell them even today exactly what the exploit was and the cheaper AI time gets the less targeting info they're going to need as the just grab everything.

I suggest modeling the attackers like a Dark Google. Think of them as well-funded, with lots of resources, and this is their day job, with dedicated teams and specialized positions and a codebase for exploits that they've been working on for years. They're not just some guy who wants to find an exploit maybe and needs huge hints about what commit might be an issue.

coldtea · 2026-04-18T16:38:01 1776530281

>Do not make the mistake of modeling the attackers as "some guy in a basement with a laptop who decided just today to start attacking things". There are nation-state attackers.

The parent's point is that if those capable attackers can exploit it anyway, doesn't mean it should be given on a silver platter to any script kiddie and guy in some basement with a laptop. The first have a much smaller target group than the latter.

staticassertion · 2026-04-18T16:47:14 1776530834

This ignores that by publicly releasing the patch is motivated.

swiftcoder · 2026-04-18T09:59:15 1776506355

> LLM is a tool, but people still need to know — what where how.

And the moment the commit lands upstream, they know what, where, and how.

The usual approach here is to backchannel patched versions to the distros and end users before the commit ever goes into upstream. Although obviously, this runs counter to some folks expectations about how open source releases work

6thbit · 2026-04-18T15:20:17 1776525617

No. You operate AS IF they know your location.

In other words, it becomes part of your threat model.

0123456789ABCDE · 2026-04-18T09:45:55 1776505555

> what

> we rely on AI to find it

> where

> the upstream commit

> how

> publicly accessible AI model with very cheap fee

miki123211 · 2026-04-18T11:59:01 1776513541

So this bug just proves my thesis about shortening update windows.

You may need Claude Mythos to find a hard-to-discover bug in a 30-year-old open source codebase, but that bug will eventually be patched, and that patch will eventually hit the git repo. This lets smaller models rediscover the bug a lot more easily.

I won't be surprised if the window between a git commit and active port scans shrinks to hours or maybe even minutes in the next year or two.

This is where closed source SaaS has a crucial advantage. You don't get the changelog, and even if you did, it wouldn't be of much use to you after the fix is deployed to production.

spacedcowboy · 2026-04-18T13:51:32 1776520292

I found a 20-year old bug in gmime a couple of months or so ago. You don't need to be an AI to do that ...

It also puts the lie to "all bugs are shallow with sufficient eyes", gmime is pretty commonly used, but locale<->UTF and back were still wrong.

maximilianburke · 2026-04-18T15:34:51 1776526491

Because malicious actors don't believe in disclosure windows.

KerrickStaley · 2026-04-08T18:46:07 1775673967

"Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting."

- Hacker News Guidelines https://news.ycombinator.com/newsguidelines.html

gobdovan · 2026-04-08T19:34:42 1775676882

It's at least Meta-relevant. Compression Represents Intelligence Linearly (Y Huang, 2024)

sumedh · 2026-04-08T21:04:46 1775682286

Such complaints are valid for AI model releases, that tells us that they are not using their own models to test their own release pages.

ValentineC · 2026-04-09T00:50:13 1775695813

Maybe they did get their models to test their pages, but they didn't tell their models to pretend that they're browsing on mobile using a 3G connection.

yawnxyz · 2026-04-08T18:57:11 1775674631

I think this speaks to the product release iself

KerrickStaley · 2026-04-07T05:22:35 1775539355

I think most people can speak faster than 120 WPM. For example this site says I speak at 343 WPM https://www.typingmaster.com/speech-speed-test/, and I self-measure 222 WPM on dense technical text.

thakoppno · 2026-04-07T05:36:58 1775540218

Micro machines guy could be vibe coding at an absurd rate.

mememememememo · 2026-04-07T06:50:23 1775544623

My LLM types at 2k WPM. So I ise that to talk to my LLMs

KerrickStaley · 2026-03-31T16:58:17 1774976297

I think (without having done extensive research) that some sort of Apple hardware is your best bet right now. Apple hasn’t raised RAM upgrade prices [1] (although to be fair their RAM upgrades were hugely inflated before the crunch) and their high memory bandwidth means they do inference faster than most consumer GPUs.

I have an M4 MacBook Air with 24 GB RAM and it doesn’t feel sufficient to run a substantial coding model (in addition to all my desktop apps). I’m thinking about upgrading to an M5 MacBook Pro with much more RAM, but I think the capabilities of cloud-hosted models will always run ahead of local models and it might never be that useful to do local inference. In the cloud you can run multiple models in parallel (e.g. to work on different problems in parallel) but locally you only have a fixed amount of memory bandwidth so running multiple model instances in parallel is slower.

[1] https://9to5mac.com/2026/03/03/apple-macbook-price-increase-...

KerrickStaley · 2026-03-06T03:14:23 1772766863

https://archive.is/cGvKG

KerrickStaley · 2026-03-05T19:48:31 1772740111

Tried this out today and it feels half-baked unfortunately. I can't get auth working (https://github.com/googleworkspace/cli/issues/198).

The decision to pass all params as a JSON string to --params makes it unfriendly for humans to experiment with, although Claude Code managed to one-shot the right command for me, so I guess this is fine. This is an intentional design per https://justin.poehnelt.com/posts/rewrite-your-cli-for-ai-ag...

KerrickStaley · 2026-03-04T19:56:13 1772654173

Side note, a lot happens at C3 other than the talks! Art, electronic gizmos and demos of all kinds, people hacking in realtime on projects, impromptu meetups, and bumping techno music :) I'd encourage people to attend in person if they get a chance; just watching the talks online is only a fraction of the experience.

KerrickStaley · 2026-03-02T03:28:29 1772422109

I recently designed an eval to see if LLMs can produce usable CAD models: https://kerrickstaley.com/2026/02/22/can-frontier-llms-solve...

Claude 4.6 Opus and Gemini 3.1 Pro can to some degree, although the 3D models they produce are often deficient in some way that my eval didn't capture.

My eval used OpenSCAD simply due to familiarity and not having time to experiment with build123d/CadQuery. There is an academic paper where they were successful at fine-tuning a small VLM to do CadQuery: https://arxiv.org/pdf/2505.14646

snowstorm82 · 2026-03-02T06:44:06 1772433846

Great work - looks like building block towards 3d-model composition integration testing. I have been looking for a solution that would allow testing component fit into surrounding components. My use-case would be to create parametric boat hull and then add components to that could be tested for fitness in the arrangement.

KerrickStaley · 2026-02-24T18:29:09 1771957749

Cool project, thanks for sharing!

The simulator lets the LLM request renders from different angles/times, so the LLM can get visual feedback. For failures, the simulator also returns status codes like `object_fell` or `mount_initially_collided_with_object` depending on what happened. You can see what the tool call looks like by looking at the Transcript tab, e.g. here https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__...

I agree it's not clear how much benefit models get from iteration. Many of the successful runs are one-shots. You can see some examples of basic spatial reasoning e.g. here https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__... :

> The initial collision is because the mount was positioned at the same height as the mug's body center (z=-22), causing overlap. I need to lower the mount significantly so the mug starts above it and drops into the cradle.

__atx__ · 2026-02-24T18:55:06 1771959306

> I'll also remove the end cap to avoid it blocking the mug's descent.

Ah yes, that matches my observations. It kinda sees that the stuff it is looking for is there, but does not see enough detail to actually notice that not only there is an endcap in the way, but the mug is also rotated the wrong way to sit in the holder.

It feels like the "r's in strawberry" effect where the models do not have enough introspection into the raw input data.

KerrickStaley · 2026-02-18T18:12:18 1771438338

  a = b = []

has the same semantics here as

  b = []
  a = b

which I don't find surprising.