> At the time of writing, the fix has not yet reached stable releases.
Why was this disclosed before the hole was patched in the stable release?
It's only been 18 days since the bug was reported to upstream, which is much shorter than typical vulnerability disclosure deadlines. The upstream commit (https://github.com/gnachman/iTerm2/commit/a9e745993c2e2cbb30...) has way less information than this blog post, so I think releasing this blog post now materially increases the chance that this will be exploited in the wild.
Update: The author was able to develop an exploit by prompting an LLM with just the upstream commit, but I still think this blog post raises the visibility of the vulnerability.
There exist some disclosure embargo exceptions when you believe the vulnerability is being used in wild or when the vulnerability fix is already released publicly (such as git commit), which makes it possible to produce exploit quickly. In this case it is preferred by the community to publish vulnerability.
My only caveat would be that in some security fixes, the pure code delta, is not always indicative of the full exploit method. But LLMs could interpolate from there depending on context.
> The author of iTerm2 initially didn’t consider it severe enough to warrant an immediate release, but they now seem to have reconsidered.
It's funny that we still have the same conversation about disclosure timelines. 18 days is plenty of time, the commit log is out there, etc.
The whole "responsible disclosure" thing is in response to people just publishing 0days, which itself was a response to vendors threatening researchers when vulns were directly reported.
It’s a wrong way to look at things. Just because CIA can know your location (if they want to), would you share live location to everyone on the internet?
LLM is a tool, but people still need to know — what where how.
Not sure if that's a great example. If there's a catastrophic vulnerability in a widely used tool, I'd sure like to know about it even if the patch is taking some time!
The problem with this is that the credible information "there's a bug in widely used tool x" will soon (if not already) be enough to trigger massive token expenditure of various others that will then also discover the bug, so this will often effectively amount to disclosure.
I guess the only winning move is to also start using AI to rapidly fix the bugs and have fast release cycles... Which of course has a host of other problems.
I think in the context of these it’s more of “we’ve discovered a bug” which gives you more information than “there is a bug”. The main difference in information being that the former implies not only there is a bug but that LLMs can find it.
If you're a random person on the Internet, I can indeed not do much with that information.
But if you're a security research lab that a competing lab can ballpark the funding of and the amount of projects they're working on (based on industry comparisons, past publications etc.), I think that can be a signal.
Wrong argument, since it's not just available to "the CIA" but every rando under the sun, people should be notified immediately if "tracking" them is possible and mitigation measures should become a common standard practice
There are many attackers that are just going to feed every commit of every project of interest to them into their LLMs and tell it "determine if this is patching an exploit and if so write the exploit". They don't need targeting clues. They're already watching everything coming out of
Do not make the mistake of modeling the attackers as "some guy in a basement with a laptop who decided just today to start attacking things". There are nation-state attackers. There are other attackers less funded than that but who still may not particularly blink at the plan I described above. Putting out the commit was sufficient to tell them even today exactly what the exploit was and the cheaper AI time gets the less targeting info they're going to need as the just grab everything.
I suggest modeling the attackers like a Dark Google. Think of them as well-funded, with lots of resources, and this is their day job, with dedicated teams and specialized positions and a codebase for exploits that they've been working on for years. They're not just some guy who wants to find an exploit maybe and needs huge hints about what commit might be an issue.
>Do not make the mistake of modeling the attackers as "some guy in a basement with a laptop who decided just today to start attacking things". There are nation-state attackers.
The parent's point is that if those capable attackers can exploit it anyway, doesn't mean it should be given on a silver platter to any script kiddie and guy in some basement with a laptop. The first have a much smaller target group than the latter.
> LLM is a tool, but people still need to know — what where how.
And the moment the commit lands upstream, they know what, where, and how.
The usual approach here is to backchannel patched versions to the distros and end users before the commit ever goes into upstream. Although obviously, this runs counter to some folks expectations about how open source releases work
So this bug just proves my thesis about shortening update windows.
You may need Claude Mythos to find a hard-to-discover bug in a 30-year-old open source codebase, but that bug will eventually be patched, and that patch will eventually hit the git repo. This lets smaller models rediscover the bug a lot more easily.
I won't be surprised if the window between a git commit and active port scans shrinks to hours or maybe even minutes in the next year or two.
This is where closed source SaaS has a crucial advantage. You don't get the changelog, and even if you did, it wouldn't be of much use to you after the fix is deployed to production.
"Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting."
Maybe they did get their models to test their pages, but they didn't tell their models to pretend that they're browsing on mobile using a 3G connection.
I think most people can speak faster than 120 WPM. For example this site says I speak at 343 WPM https://www.typingmaster.com/speech-speed-test/, and I self-measure 222 WPM on dense technical text.
I think (without having done extensive research) that some sort of Apple hardware is your best bet right now. Apple hasn’t raised RAM upgrade prices [1] (although to be fair their RAM upgrades were hugely inflated before the crunch) and their high memory bandwidth means they do inference faster than most consumer GPUs.
I have an M4 MacBook Air with 24 GB RAM and it doesn’t feel sufficient to run a substantial coding model (in addition to all my desktop apps). I’m thinking about upgrading to an M5 MacBook Pro with much more RAM, but I think the capabilities of cloud-hosted models will always run ahead of local models and it might never be that useful to do local inference. In the cloud you can run multiple models in parallel (e.g. to work on different problems in parallel) but locally you only have a fixed amount of memory bandwidth so running multiple model instances in parallel is slower.
The decision to pass all params as a JSON string to --params makes it unfriendly for humans to experiment with, although Claude Code managed to one-shot the right command for me, so I guess this is fine. This is an intentional design per https://justin.poehnelt.com/posts/rewrite-your-cli-for-ai-ag...
Side note, a lot happens at C3 other than the talks! Art, electronic gizmos and demos of all kinds, people hacking in realtime on projects, impromptu meetups, and bumping techno music :) I'd encourage people to attend in person if they get a chance; just watching the talks online is only a fraction of the experience.
Claude 4.6 Opus and Gemini 3.1 Pro can to some degree, although the 3D models they produce are often deficient in some way that my eval didn't capture.
My eval used OpenSCAD simply due to familiarity and not having time to experiment with build123d/CadQuery. There is an academic paper where they were successful at fine-tuning a small VLM to do CadQuery: https://arxiv.org/pdf/2505.14646
Great work - looks like building block towards 3d-model composition integration testing. I have been looking for a solution that would allow testing component fit into surrounding components. My use-case would be to create parametric boat hull and then add components to that could be tested for fitness in the arrangement.
The simulator lets the LLM request renders from different angles/times, so the LLM can get visual feedback. For failures, the simulator also returns status codes like `object_fell` or `mount_initially_collided_with_object` depending on what happened. You can see what the tool call looks like by looking at the Transcript tab, e.g. here https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__...
> The initial collision is because the mount was positioned at the same height as the mug's body center (z=-22), causing overlap. I need to lower the mount significantly so the mug starts above it and drops into the cradle.
> I'll also remove the end cap to avoid it blocking the mug's descent.
Ah yes, that matches my observations. It kinda sees that the stuff it is looking for is there, but does not see enough detail to actually notice that not only there is an endcap in the way, but the mug is also rotated the wrong way to sit in the holder.
It feels like the "r's in strawberry" effect where the models do not have enough introspection into the raw input data.
Why was this disclosed before the hole was patched in the stable release?
It's only been 18 days since the bug was reported to upstream, which is much shorter than typical vulnerability disclosure deadlines. The upstream commit (https://github.com/gnachman/iTerm2/commit/a9e745993c2e2cbb30...) has way less information than this blog post, so I think releasing this blog post now materially increases the chance that this will be exploited in the wild.
Update: The author was able to develop an exploit by prompting an LLM with just the upstream commit, but I still think this blog post raises the visibility of the vulnerability.
reply