So human become just a provider of those 6 digits code ? That’s already the main problem i have with most agents: I want them to perform a very easy task: « fetch all recepts from website x,y and z and upload them to the correct expense of my expense tracking tool ». Ai are perfectly capable of performing this. But because every website requires sso + 2 fa, without any possibility to remove this, so i effectively have to watch them do it and my whole existence can be summarized as: « look at your phone and input the 6 digits ».
The thing i want ai to be able to do on my behalf is manage those 2fa steps; not add some.
This is where the Claw layer helps — rather than hoping the agent handles the interruption gracefully, you design explicit human approval gates into the execution loop. The Claw pauses, surfaces the 2FA prompt, waits for input, then resumes with full state intact. The problem IMTDb describes isn't really 2FA, it's agents that have a hard time suspending and resuming mid-task cleanly. But that is today, tomorrow, that is an unknown variable.
It's technically possible to use 2FA (e.g. TOTP) on the same device as the agent, if appropriate in your threat model.
In the scenario you describe, 2FA is enforcing a human-in-the-loop test at organizational boundaries. Removing that test will need an even stronger mechanism to determine when a human is needed within the execution loop, e.g. when making persistent changes or spending money, rather than copying non-restricted data from A to B.
Reading through the discussion I was also thinking of the other fly.io blog post around their setup with macaroon tokens and being able to quite easily reduce the blast radius of them by adding more caveats. Feels like you could build out some kind of capability system with that that might mitigate some risks somewhat.
Regarding sexism; most tournaments in Chess (including the world championship) are fully open and are thus gender netral: anyone can participate regardless of sex/gender and will compete on equal footing.
Women only categories have been created to give women visibility because they mostly were not able to reach advanced levels in the open format.
Some women choose to compete with men (Judit Polgár being a somewhat recent example) but most go straight to the women only tournaments to have a shot.
The men vs women « bias » is not unproven, they litterally had to create entire categories of competiton to account for it.
That’s true for “tips and tricks” knowledge like “which model is best today” or “tell the model you’ll get fired if the answer is wrong to increase accuracy” that pops up on Twitter/X. It’s fleeting, makes people feel like “experts”, and doesn’t age well.
On the other hand, deeply understanding how models work and where they fall short, how to set up, organize, and maintain context, and which tools and workflows support that tends to last much longer. When something like the “Ralph loop” blows up on social media (and dies just as fast), the interesting question is: what problem was it trying to solve, and how did it do it differently from alternatives? Thinking through those problems is like training a muscle, and that muscle stays useful even as the underlying technology evolves.
> what problem was it trying to solve, and how did it do it differently from alternatives?
Sounds to me like accidental complexity. The essential problem is to write good code for the computer to do it's task?
There's an issue if you're (general you) more focused on fixing the tool than on the primary problem, especially when you don't know if the tool is even suitable,
It does seem like things are moving very quickly even deeper than what you are saying. Less than a year ago langchain, model fine tuning and RAG were the cutting edge and the “thing to do”.
Now because of models improving, context sizes getting bigger, and commercial offerings improving I hardly hear about them.
I’m not sure destroying other people’s property is the best way to make them sympathetic to your cause.
I don’t own a Ring camera (or any similar device), but the idea that someone could spend time unnoticed on my porch, messing with my stuff, right where my daughter likes to play on weekends, makes my skin crawl.
If that happened to me, I’d probably just double down on security to be honest. Knowing that some people actually feel it's the right thing to do makes me wonder if I shouldn't start today.
To be clear, I have no issue with someone peacefully informing people in their neighborhood about the potential dire consequences of enabling "share images of my doorbell with the government or other private agencies", that's all fine to me. But if you feel the need to impose your views by harassing me about it or by breaking the law to get your point across, you won't get an ally in me.
It's always the same. Go back and think about the history you read and stories you've loved. Were you upset when the Rebels destroyed the Empire's property? Should they not have blown up the death star? Should they have gone through "proper channels". Go look at any revolution that you side with, tell me they didn't destroy property. I understand your comfortable but there are literally minorities, often times US citizens, getting rounded up and denied their rights. So you can sit idly by and criticize those that fight this system. However, you are so obviously on the wrong side of history and you would recognize it in any other era except your own.
> I’m not sure destroying other people’s property is the best way to make them sympathetic to your cause.
We're in a slow moving civil war at this point. Looking for sympathy stopped making sense a long time ago. You're either pro humanity or pro property tbh
>We're in a slow moving civil war at this point [...] You're either pro humanity or pro property tbh
You don't realize this type of thinking is exactly what contributes to the "civil war"? Same with all this virtue signaling where if you're even slightly for some sort of immigration enforcement you're labeled as not being "pro humanity" or whatever, and then a populist gets in power because the other side's rallying cry is "there's no illegal on stolen land". In the wake of the killing of Renée Good, Trump's approval on immigration was 48% approve to 52% disapprove. In the same survey, who do you think voters trusted more on immigration? Still Republicans, 44% to 33%.
> You don't realize this type of thinking is exactly what contributes to the "civil war"?
Of course. But we need meaning and values in our lives, both of which have been absent from politics my entire life. At some point we're due for course correction, or I can't bear to live here anymore.
> if you're even slightly for some sort of immigration enforcement you're labeled as not being "pro humanity" or whatever, and then a populist gets in power because the other side's rallying cry is "there's no illegal on stolen land".
Both of these people are liberals detached from reality. The opposing side would stand for better material conditions for everyone.
If you're not going to ally with the people fighting the surveillance systems that are currently being used by the secret police to disappear and kill people what does that make you. My cause doesn't need your sympathy it needs to stop this horror. I'm not quite saying "with or against" but you are saying "against."
>If you're not going to ally with the people fighting the surveillance systems that are currently being used by the secret police to disappear and kill people what does that make you.
1990s Ireland:
A: "hey guys, maybe it's a bad idea to set off bombs in public places to promote Irish independence. You won't get an ally in me."
B: "If you're not going to ally with the people fighting British that are currently subjugating the Irish what does that make you. My cause doesn't need your sympathy it needs to stop this horror. I'm not quite saying "with or against" but you are saying "against.""
It’s not that nobody cared, it’s that the cost of building and maintaining CLIs, relative to the usage they got, often didn’t make economic sense. In fact, this is the first time I’ve seen someone want to use Slack via a CLI, not a TUI, an actual CLI. APIs, on the other hand, had plenty of real usage and made business sense, so most services offered them.
With AI, two things have changed: (1) the cost of building a CLI on top of a documented API has dropped a lot, and (2) there’s a belief that “designed for agents” CLIs will enable new kinds of usage that weren’t practical before and that will move the needle on the bottom line.
There are plenty of “chill and peaceful” city and town builders that trade realism for prettier, more idealized places.
In more simulation-focused games, cycling and walking paths are often available, and you can use them, but they come with many of the same constraints they face in the real world. In practice, that means they are usually not efficient as the primary way to move large numbers of people across a large city.
Reading your comment, it sounds like you want a game that is realistic in most respects, but treats transportation differently, in a way that makes your preferred options the optimal strategy. That is going to be hard to find, since transportation is a core part of city-building sims, and developers tend to pick either realism or a more utopian/fantasy model rather than mixing both in a single game.
That's not what I want at all. I want a more realistic sim that deals with issues such as sprawl, food deserts, transportation elasticity of demand, mental health issues (and their impact on crime and productivity), and a network-flow theoretical model of transportation and commuting contributes to all this. Building a bunch of sprawling suburbs that feed into a dense downtown core should make your citizens' commute times shoot way up and lead to misery.
A well-built large city isn't just going to be 100% biking and walking paths, it's going to have streetcars, light rail transit, subways, and buses as well as roads with cars. The difference is that people shouldn't be forced to commute across the entire city to get to work because you decided to cram all of the commercial zoning into one downtown core.
> The difference is that people shouldn't be forced to commute across the entire city to get to work because you decided to cram all of the commercial zoning into one downtown core.
Isn't the point that they should be, if that's how I choose to build a city, and they don't have to be, if you choose to build it otherwise? The entire point of a sandbox city-builder is, I assume, that it's a sandbox, and not a dogmatic interpretation of a childish Reddit meme.
It was pointed out elsewhere in this thread that SimCity already distorts reality in an ideological way: it lets you have tons of traffic without worrying about parking. It just gives you magical free underground parking everywhere that you never have to think about, in order to avoid the usual suburban parking sprawl hellscape.
The point is to illustrate that SimCity isn't a blank-slate, value-free sandbox city-builder. It has rules and those rules have been made deliberately unrealistic in ways that favour North American style cities.
It's like a fluid dynamics sandbox that causes water to flow uphill rather than settling into the valleys.
Car-centric transportation is not efficient. Not remotely. They have absolutely terrible bandwidth, and they balloon the size of cities apart the more you try to increase the speed to bring them closer together.
If you think Simcity and Cities: Skylines are realistic depictions, then ask yourself why Simcity famously has no visible parking whatsoever (or don't: the devs are on record saying they excluded it because it made the cities look terrible, there's no need to speculate here), or ask yourself why Cities: Skyline added car pokeballs (where drivers get out of the car and put the car in their pocket) or straight-up delete cars when traffic gets too heavy.
> Reading your comment, it sounds like you want a game that is realistic in most respects, but treats transportation differently,
It's the opposite, no? Most city builders cheat to be able to be fun. Like, with the amount of roads one build in Sim City, half the map would have had to been parking lots to account for that amount of traffic. But that's boring gameplay, so they remove that constraint to make a fun game. Aka you never have to deal with the consequences of making your city car dependent.
Edit: See another comment from CalRobert about exactly this.
The original SimCity was perfection - you could build no roads and nothing but rail! ;)
Cities Skylines with all the DLC and the right transportation mods gets pretty “realistic” in that you can build a transit paradise but the car still exists.
[citation needed] that some combination of "New Urbanism, traditional neighbourhood design, streetcar suburbs, one-way streets, bike paths, walking paths, mixed-zone walkable villages (light commercial with residential), smaller single-family houses and duplexes, triplexes, houses behind houses." is not in fact optimal! (For certain objective functions)
I’m curious what’s behind the speed improvements. It seems unlikely it’s just prioritization, so what else is changing? Is it new hardware (à la Groq or Cerebras)? That seems plausible, especially since it isn’t available on some cloud providers.
Also wondering whether we’ll soon see separate “speed” vs “cleverness” pricing on other LLM providers too.
It comes from batching and multiple streams on a GPU. More people sharing 1 GPU makes everyone run slower but increases overall token throughput.
Mathematically it comes from the fact that this transformer block is this parallel algorithm. If you batch harder, increase parallelism, you can get higher tokens/s. But you get less throughput. Simultaneously there is also this dial that you can speculatively decode harder with fewer users.
Its true for basically all hardware and most models. You can draw this Pareto curve of how much throughput per GPU vs how many tokens per second per stream. More tokens/s less total throughput.
See this graph for actual numbers:
Token Throughput per GPU vs. Interactivity
gpt-oss 120B • FP4 • 1K / 8K • Source: SemiAnalysis InferenceMAX™
> If you batch harder, increase parallelism, you can get higher tokens/s. But you get less throughput. Simultaneously there is also this dial that you can speculatively decode harder with fewer users.
I think you skipped the word “total throughout” there right? Cause tok/s is a measure of throughput, so it’s clearer to say you increase throughput/user at the expense of throughput/gpu.
I’m not sure about the comment about speculative decode though. I haven’t served a frontier model but generally speculative decode I believe doesn’t help beyond a few tokens, so I’m not sure you can “speculatively decode harder” with fewer users.
There are a lot of knobs they could tweak. Newer hardware and traffic prioritisation would both make a lot of sense. But they could also lower batching windows to decrease queueing time at the cost of lower throughput, or keep the KV cache in GPU memory at the expense of reducing the number of users they can serve from each GPU node.
2.4x faster memory - which is exactly what they are saying the speedup is. I suspect they are just routing to GB200 (or TPU etc equivalents).
FWIW I did notice _sometimes_ recently Opus was very fast. I put it down to a bug in Claude Code's token counting, but perhaps it was actually just occasionally getting routed to GB200s.
Dylan Patel did analysis that suggests lower batch size and more speculative decoding leads to 2.5x more per-user throughput for 6x the cost for open models [0]. Seems plausible this could be what they are doing. We probably won't get to know for sure any time soon.
Regardless, they don't need to be using new hardware to get speedups like this. It's possible you just hit A/B testing and not newer hardware. I'd be surprised if they were using their latest hardware for inference tbh.
Why does this seem unlikely? I have no doubt they are optimizing all the time, including inference speed, but why could this particular lever not entirely be driven by skipping the queue? It's an easy way to generate more money.
Yes it's 100% prioritization. Through that it's also likely running on more GPUs at once but that's an artifact of prioritization at the datacenter level. Any task coming into an AI datacenter atm is split into fairly fined grained chunks of work and added to queues to be processed.
When you add a job with high priority all those chunks will be processed off the queue first by each and every GPU that frees up. It probably leads to more parallelism but... it's the prioritization that led to this happening. It's better to think of this as prioritization of your job leading to the perf improvement.
Here's a good blog for anyone interested which talks about prioritization and job scheduling. It's not quite at the datacenter level but the concepts are the same. Basically everything is thought of as a pipeline. All training jobs are low pri (they take months to complete in any case), customer requests are mid pri and then there's options for high pri. Everything in an AI datacenter is thought of in terms of 'flow'. Are there any bottlenecks? Are the pipelines always full and the expensive hardware always 100% utilized? Are the queues backlogs big enough to ensure full utilization at every stage?
Amazon Bedrock has a similar feature called "priority tier": you get faster responses at 1.75x the price. And they explicitly say in the docs "priority requests receive preferential treatment in the processing queue, moving ahead of standard requests for faster responses".
I wonder if they might have mostly implemented this for themselves to use internally, and it is just prioritization but they don't expect too many others to pay the high cost.
The idea is that, over time, the quality and accuracy of world-model outputs will improve. That, in turn, lets autonomous driving systems train on a large amount of “realistic enough” synthetic data.
For example, we know from experience that Waymo is currently good enough to drive in San Francisco. We don’t yet trust it in more complex environments like dense European cities or Southeast Asian “hell roads.” Running the stack against world models can give a big head start in understanding what works, and which situations are harder, without putting any humans in harm’s way.
We don’t need perfect accuracy from the world model to get real value. And, as usual, the more we use and validate these models, the more we can improve them; creating a virtuous cycle.
A significant portion of engineering time is now spent ensuring that yes, the LLM does know about all of that. This context can be surfaced through skills, MCP, connectors, RAG over your tools, etc. Companies are also starting to reshape their entire processes to ensure this information can be properly and accurately surfaced. Most are still far from completing that transformation, but progress tends to happen slowly, then all at once.
He intentionally disrupted all communications including, but not limited to, emergency ones. In process he tried to unilaterally control a public resource based on his own authoritarian view of what is "good" vs "bad". He is lucky he is not in jail.
The thing i want ai to be able to do on my behalf is manage those 2fa steps; not add some.
reply