More

IMTDb · 2026-02-21T19:38:22 1771702702

So human become just a provider of those 6 digits code ? That’s already the main problem i have with most agents: I want them to perform a very easy task: « fetch all recepts from website x,y and z and upload them to the correct expense of my expense tracking tool ». Ai are perfectly capable of performing this. But because every website requires sso + 2 fa, without any possibility to remove this, so i effectively have to watch them do it and my whole existence can be summarized as: « look at your phone and input the 6 digits ».

The thing i want ai to be able to do on my behalf is manage those 2fa steps; not add some.

akssassin907 · 2026-02-21T21:51:56 1771710716

This is where the Claw layer helps — rather than hoping the agent handles the interruption gracefully, you design explicit human approval gates into the execution loop. The Claw pauses, surfaces the 2FA prompt, waits for input, then resumes with full state intact. The problem IMTDb describes isn't really 2FA, it's agents that have a hard time suspending and resuming mid-task cleanly. But that is today, tomorrow, that is an unknown variable.

walterbell · 2026-02-21T20:30:55 1771705855

It's technically possible to use 2FA (e.g. TOTP) on the same device as the agent, if appropriate in your threat model.

In the scenario you describe, 2FA is enforcing a human-in-the-loop test at organizational boundaries. Removing that test will need an even stronger mechanism to determine when a human is needed within the execution loop, e.g. when making persistent changes or spending money, rather than copying non-restricted data from A to B.

conception · 2026-02-22T03:10:24 1771729824

!!DO NOT DO THIS!!

You can use 1password and 1password cli to give it mfa access and passwords at its leisure.

adrianN · 2026-02-22T03:39:47 1771731587

One prompt injection away from sending all your credentials to the Internet?

jrvarela56 · 2026-02-22T06:52:03 1771743123

Agree, i was going the vaultwarden route and figured this pattern seems better: https://fly.io/blog/tokenized-tokens/

Secrets are encrypted and the proxy decrypts on the fly if destination is whitelisted for that token.

dracyr · 2026-02-22T08:47:15 1771750035

Reading through the discussion I was also thinking of the other fly.io blog post around their setup with macaroon tokens and being able to quite easily reduce the blast radius of them by adding more caveats. Feels like you could build out some kind of capability system with that that might mitigate some risks somewhat.

pharrington · 2026-02-22T05:07:31 1771736851

2fa, except its 0 factors instead of two?

IMTDb · 2026-02-16T02:25:19 1771208719

Regarding sexism; most tournaments in Chess (including the world championship) are fully open and are thus gender netral: anyone can participate regardless of sex/gender and will compete on equal footing.

Women only categories have been created to give women visibility because they mostly were not able to reach advanced levels in the open format.

Some women choose to compete with men (Judit Polgár being a somewhat recent example) but most go straight to the women only tournaments to have a shot.

The men vs women « bias » is not unproven, they litterally had to create entire categories of competiton to account for it.

IMTDb · 2026-02-10T23:41:23 1770766883

That’s true for “tips and tricks” knowledge like “which model is best today” or “tell the model you’ll get fired if the answer is wrong to increase accuracy” that pops up on Twitter/X. It’s fleeting, makes people feel like “experts”, and doesn’t age well.

On the other hand, deeply understanding how models work and where they fall short, how to set up, organize, and maintain context, and which tools and workflows support that tends to last much longer. When something like the “Ralph loop” blows up on social media (and dies just as fast), the interesting question is: what problem was it trying to solve, and how did it do it differently from alternatives? Thinking through those problems is like training a muscle, and that muscle stays useful even as the underlying technology evolves.

skydhash · 2026-02-11T01:40:48 1770774048

> what problem was it trying to solve, and how did it do it differently from alternatives?

Sounds to me like accidental complexity. The essential problem is to write good code for the computer to do it's task?

There's an issue if you're (general you) more focused on fixing the tool than on the primary problem, especially when you don't know if the tool is even suitable,

camdenreslink · 2026-02-11T00:02:55 1770768175

It does seem like things are moving very quickly even deeper than what you are saying. Less than a year ago langchain, model fine tuning and RAG were the cutting edge and the “thing to do”.

Now because of models improving, context sizes getting bigger, and commercial offerings improving I hardly hear about them.

IMTDb · 2026-02-09T23:28:02 1770679682

I’m not sure destroying other people’s property is the best way to make them sympathetic to your cause.

I don’t own a Ring camera (or any similar device), but the idea that someone could spend time unnoticed on my porch, messing with my stuff, right where my daughter likes to play on weekends, makes my skin crawl.

If that happened to me, I’d probably just double down on security to be honest. Knowing that some people actually feel it's the right thing to do makes me wonder if I shouldn't start today.

To be clear, I have no issue with someone peacefully informing people in their neighborhood about the potential dire consequences of enabling "share images of my doorbell with the government or other private agencies", that's all fine to me. But if you feel the need to impose your views by harassing me about it or by breaking the law to get your point across, you won't get an ally in me.

thinkingtoilet · 2026-02-10T13:46:43 1770731203

It's always the same. Go back and think about the history you read and stories you've loved. Were you upset when the Rebels destroyed the Empire's property? Should they not have blown up the death star? Should they have gone through "proper channels". Go look at any revolution that you side with, tell me they didn't destroy property. I understand your comfortable but there are literally minorities, often times US citizens, getting rounded up and denied their rights. So you can sit idly by and criticize those that fight this system. However, you are so obviously on the wrong side of history and you would recognize it in any other era except your own.

fwip · 2026-02-09T23:45:52 1770680752

Well, they aren't trying to win your sympathies.

FranklinJabar · 2026-02-10T01:25:25 1770686725

> I’m not sure destroying other people’s property is the best way to make them sympathetic to your cause.

We're in a slow moving civil war at this point. Looking for sympathy stopped making sense a long time ago. You're either pro humanity or pro property tbh

gruez · 2026-02-10T01:49:10 1770688150

>We're in a slow moving civil war at this point [...] You're either pro humanity or pro property tbh

You don't realize this type of thinking is exactly what contributes to the "civil war"? Same with all this virtue signaling where if you're even slightly for some sort of immigration enforcement you're labeled as not being "pro humanity" or whatever, and then a populist gets in power because the other side's rallying cry is "there's no illegal on stolen land". In the wake of the killing of Renée Good, Trump's approval on immigration was 48% approve to 52% disapprove. In the same survey, who do you think voters trusted more on immigration? Still Republicans, 44% to 33%.

https://prod-i.a.dj.com/public/resources/documents/Redacted_...

FranklinJabar · 2026-02-10T17:01:03 1770742863

> You don't realize this type of thinking is exactly what contributes to the "civil war"?

Of course. But we need meaning and values in our lives, both of which have been absent from politics my entire life. At some point we're due for course correction, or I can't bear to live here anymore.

> if you're even slightly for some sort of immigration enforcement you're labeled as not being "pro humanity" or whatever, and then a populist gets in power because the other side's rallying cry is "there's no illegal on stolen land".

Both of these people are liberals detached from reality. The opposing side would stand for better material conditions for everyone.

giraffe_lady · 2026-02-10T01:16:47 1770686207

> you won't get an ally in me.

If you're not going to ally with the people fighting the surveillance systems that are currently being used by the secret police to disappear and kill people what does that make you. My cause doesn't need your sympathy it needs to stop this horror. I'm not quite saying "with or against" but you are saying "against."

gruez · 2026-02-10T02:09:14 1770689354

>If you're not going to ally with the people fighting the surveillance systems that are currently being used by the secret police to disappear and kill people what does that make you.

1990s Ireland:

A: "hey guys, maybe it's a bad idea to set off bombs in public places to promote Irish independence. You won't get an ally in me."

B: "If you're not going to ally with the people fighting British that are currently subjugating the Irish what does that make you. My cause doesn't need your sympathy it needs to stop this horror. I'm not quite saying "with or against" but you are saying "against.""

giraffe_lady · 2026-02-10T12:00:46 1770724846

ok see you out there I guess

IMTDb · 2026-02-09T12:45:52 1770641152

It’s not that nobody cared, it’s that the cost of building and maintaining CLIs, relative to the usage they got, often didn’t make economic sense. In fact, this is the first time I’ve seen someone want to use Slack via a CLI, not a TUI, an actual CLI. APIs, on the other hand, had plenty of real usage and made business sense, so most services offered them.

With AI, two things have changed: (1) the cost of building a CLI on top of a documented API has dropped a lot, and (2) there’s a belief that “designed for agents” CLIs will enable new kinds of usage that weren’t practical before and that will move the needle on the bottom line.

IMTDb · 2026-02-09T03:11:48 1770606708

There are plenty of “chill and peaceful” city and town builders that trade realism for prettier, more idealized places.

In more simulation-focused games, cycling and walking paths are often available, and you can use them, but they come with many of the same constraints they face in the real world. In practice, that means they are usually not efficient as the primary way to move large numbers of people across a large city.

Reading your comment, it sounds like you want a game that is realistic in most respects, but treats transportation differently, in a way that makes your preferred options the optimal strategy. That is going to be hard to find, since transportation is a core part of city-building sims, and developers tend to pick either realism or a more utopian/fantasy model rather than mixing both in a single game.

chongli · 2026-02-09T04:02:44 1770609764

That's not what I want at all. I want a more realistic sim that deals with issues such as sprawl, food deserts, transportation elasticity of demand, mental health issues (and their impact on crime and productivity), and a network-flow theoretical model of transportation and commuting contributes to all this. Building a bunch of sprawling suburbs that feed into a dense downtown core should make your citizens' commute times shoot way up and lead to misery.

A well-built large city isn't just going to be 100% biking and walking paths, it's going to have streetcars, light rail transit, subways, and buses as well as roads with cars. The difference is that people shouldn't be forced to commute across the entire city to get to work because you decided to cram all of the commercial zoning into one downtown core.

jmye · 2026-02-09T16:23:35 1770654215

> The difference is that people shouldn't be forced to commute across the entire city to get to work because you decided to cram all of the commercial zoning into one downtown core.

Isn't the point that they should be, if that's how I choose to build a city, and they don't have to be, if you choose to build it otherwise? The entire point of a sandbox city-builder is, I assume, that it's a sandbox, and not a dogmatic interpretation of a childish Reddit meme.

chongli · 2026-02-09T21:33:56 1770672836

It was pointed out elsewhere in this thread that SimCity already distorts reality in an ideological way: it lets you have tons of traffic without worrying about parking. It just gives you magical free underground parking everywhere that you never have to think about, in order to avoid the usual suburban parking sprawl hellscape.

jmye · 2026-02-10T21:37:33 1770759453

What, specifically, does this have to do with my comment? Aside from being another opportunity to abuse a tired, ill-conceived soapbox?

chongli · 2026-02-10T23:12:25 1770765145

The point is to illustrate that SimCity isn't a blank-slate, value-free sandbox city-builder. It has rules and those rules have been made deliberately unrealistic in ways that favour North American style cities.

It's like a fluid dynamics sandbox that causes water to flow uphill rather than settling into the valleys.

Qwertious · 2026-02-09T05:07:55 1770613675

Car-centric transportation is not efficient. Not remotely. They have absolutely terrible bandwidth, and they balloon the size of cities apart the more you try to increase the speed to bring them closer together.

If you think Simcity and Cities: Skylines are realistic depictions, then ask yourself why Simcity famously has no visible parking whatsoever (or don't: the devs are on record saying they excluded it because it made the cities look terrible, there's no need to speculate here), or ask yourself why Cities: Skyline added car pokeballs (where drivers get out of the car and put the car in their pocket) or straight-up delete cars when traffic gets too heavy.

matsemann · 2026-02-09T08:27:51 1770625671

> Reading your comment, it sounds like you want a game that is realistic in most respects, but treats transportation differently,

It's the opposite, no? Most city builders cheat to be able to be fun. Like, with the amount of roads one build in Sim City, half the map would have had to been parking lots to account for that amount of traffic. But that's boring gameplay, so they remove that constraint to make a fun game. Aka you never have to deal with the consequences of making your city car dependent.

Edit: See another comment from CalRobert about exactly this.

bombcar · 2026-02-09T04:05:26 1770609926

The original SimCity was perfection - you could build no roads and nothing but rail! ;)

Cities Skylines with all the DLC and the right transportation mods gets pretty “realistic” in that you can build a transit paradise but the car still exists.

prmoustache · 2026-02-09T11:33:06 1770636786

You have a weird definition of "realism".

Tarrosion · 2026-02-09T04:01:30 1770609690

[citation needed] that some combination of "New Urbanism, traditional neighbourhood design, streetcar suburbs, one-way streets, bike paths, walking paths, mixed-zone walkable villages (light commercial with residential), smaller single-family houses and duplexes, triplexes, houses behind houses." is not in fact optimal! (For certain objective functions)

IMTDb · 2026-02-07T19:28:03 1770492483

I’m curious what’s behind the speed improvements. It seems unlikely it’s just prioritization, so what else is changing? Is it new hardware (à la Groq or Cerebras)? That seems plausible, especially since it isn’t available on some cloud providers.

Also wondering whether we’ll soon see separate “speed” vs “cleverness” pricing on other LLM providers too.

kingstnap · 2026-02-07T20:37:06 1770496626

It comes from batching and multiple streams on a GPU. More people sharing 1 GPU makes everyone run slower but increases overall token throughput.

Mathematically it comes from the fact that this transformer block is this parallel algorithm. If you batch harder, increase parallelism, you can get higher tokens/s. But you get less throughput. Simultaneously there is also this dial that you can speculatively decode harder with fewer users.

Its true for basically all hardware and most models. You can draw this Pareto curve of how much throughput per GPU vs how many tokens per second per stream. More tokens/s less total throughput.

See this graph for actual numbers:

Token Throughput per GPU vs. Interactivity gpt-oss 120B • FP4 • 1K / 8K • Source: SemiAnalysis InferenceMAX™

https://inferencemax.semianalysis.com/

vlovich123 · 2026-02-07T23:29:31 1770506971

> If you batch harder, increase parallelism, you can get higher tokens/s. But you get less throughput. Simultaneously there is also this dial that you can speculatively decode harder with fewer users.

I think you skipped the word “total throughout” there right? Cause tok/s is a measure of throughput, so it’s clearer to say you increase throughput/user at the expense of throughput/gpu.

I’m not sure about the comment about speculative decode though. I haven’t served a frontier model but generally speculative decode I believe doesn’t help beyond a few tokens, so I’m not sure you can “speculatively decode harder” with fewer users.

sothatsit · 2026-02-07T19:31:26 1770492686

There are a lot of knobs they could tweak. Newer hardware and traffic prioritisation would both make a lot of sense. But they could also lower batching windows to decrease queueing time at the cost of lower throughput, or keep the KV cache in GPU memory at the expense of reducing the number of users they can serve from each GPU node.

martinald · 2026-02-08T00:44:01 1770511441

I think it's just routing to faster hardware:

H100 SXM: 3.35 TB/s HBM3

GB200: 8 TB/s HBM3e

2.4x faster memory - which is exactly what they are saying the speedup is. I suspect they are just routing to GB200 (or TPU etc equivalents).

FWIW I did notice _sometimes_ recently Opus was very fast. I put it down to a bug in Claude Code's token counting, but perhaps it was actually just occasionally getting routed to GB200s.

sothatsit · 2026-02-08T09:01:19 1770541279

Dylan Patel did analysis that suggests lower batch size and more speculative decoding leads to 2.5x more per-user throughput for 6x the cost for open models [0]. Seems plausible this could be what they are doing. We probably won't get to know for sure any time soon.

Regardless, they don't need to be using new hardware to get speedups like this. It's possible you just hit A/B testing and not newer hardware. I'd be surprised if they were using their latest hardware for inference tbh.

[0] https://nitter.net/dylan522p/status/2020302299827171430

jstummbillig · 2026-02-07T19:41:14 1770493274

> It seems unlikely it’s just prioritization

Why does this seem unlikely? I have no doubt they are optimizing all the time, including inference speed, but why could this particular lever not entirely be driven by skipping the queue? It's an easy way to generate more money.

AnotherGoodName · 2026-02-07T22:33:43 1770503623

Yes it's 100% prioritization. Through that it's also likely running on more GPUs at once but that's an artifact of prioritization at the datacenter level. Any task coming into an AI datacenter atm is split into fairly fined grained chunks of work and added to queues to be processed.

When you add a job with high priority all those chunks will be processed off the queue first by each and every GPU that frees up. It probably leads to more parallelism but... it's the prioritization that led to this happening. It's better to think of this as prioritization of your job leading to the perf improvement.

Here's a good blog for anyone interested which talks about prioritization and job scheduling. It's not quite at the datacenter level but the concepts are the same. Basically everything is thought of as a pipeline. All training jobs are low pri (they take months to complete in any case), customer requests are mid pri and then there's options for high pri. Everything in an AI datacenter is thought of in terms of 'flow'. Are there any bottlenecks? Are the pipelines always full and the expensive hardware always 100% utilized? Are the queues backlogs big enough to ensure full utilization at every stage?

https://www.aleksagordic.com/blog/vllm

kgeist · 2026-02-07T23:42:18 1770507738

>Yes it's 100% prioritization

Amazon Bedrock has a similar feature called "priority tier": you get faster responses at 1.75x the price. And they explicitly say in the docs "priority requests receive preferential treatment in the processing queue, moving ahead of standard requests for faster responses".

singpolyma3 · 2026-02-07T19:47:56 1770493676

Until everyone buys it. Like fast pass at an amusement park where the fast line is still two hours long

servercobra · 2026-02-07T19:55:34 1770494134

It's a good way to squeeze extra out of a bunch of people without actually raising prices.

sothatsit · 2026-02-07T19:57:14 1770494234

At 6x the cost, and it requiring you to pay full API pricing, I don’t think this is going to be a concern.

Nition · 2026-02-07T19:33:40 1770492820

I wonder if they might have mostly implemented this for themselves to use internally, and it is just prioritization but they don't expect too many others to pay the high cost.

sothatsit · 2026-02-07T19:46:26 1770493586

Roon said as much here [0]:

> codex-5.2 is really amazing but using it from my personal and not work account over the weekend taught me some user empathy lol it’s a bit slow

[0] https://nitter.net/tszzl/status/2016338961040548123

Nition · 2026-02-07T22:58:54 1770505134

I see Anthropic says so here as well: https://x.com/claudeai/status/2020207322124132504

re-thc · 2026-02-07T22:01:46 1770501706

Nvidia GB300 i.e. Blackwell.

pshirshov · 2026-02-07T19:29:15 1770492555

> so what else is changing?

Let me guess. Quantization?

IMTDb · 2026-02-06T22:28:14 1770416894

The idea is that, over time, the quality and accuracy of world-model outputs will improve. That, in turn, lets autonomous driving systems train on a large amount of “realistic enough” synthetic data.

For example, we know from experience that Waymo is currently good enough to drive in San Francisco. We don’t yet trust it in more complex environments like dense European cities or Southeast Asian “hell roads.” Running the stack against world models can give a big head start in understanding what works, and which situations are harder, without putting any humans in harm’s way.

We don’t need perfect accuracy from the world model to get real value. And, as usual, the more we use and validate these models, the more we can improve them; creating a virtuous cycle.

tantalor · 2026-02-07T04:39:30 1770439170

It's a pareto principal.

You can get 80% of the way to "perfect" with 20% of the effort.

dyauspitr · 2026-02-07T05:19:19 1770441559

That’s just a platitude at this point. They for all intents and purposes solved the problem, atleast in the US.

IMTDb · 2026-02-06T00:39:45 1770338385

A significant portion of engineering time is now spent ensuring that yes, the LLM does know about all of that. This context can be surfaced through skills, MCP, connectors, RAG over your tools, etc. Companies are also starting to reshape their entire processes to ensure this information can be properly and accurately surfaced. Most are still far from completing that transformation, but progress tends to happen slowly, then all at once.

bcarv · 2026-02-06T14:08:54 1770386934

[flagged]

generallyjosh · 2026-02-07T01:14:13 1770426853

All we can do is try our best to look at the world with clear eyes, and think about where the industry's going over the next couple years

Not how we want things to be, but how they actually are and will be

I don't think AI for programming is a passing fad

jondwillis · 2026-02-06T15:11:48 1770390708

Who hurt you?

Also what are you even proposing/advocating for here?

This meta-state-of-company context is just as capturable as anything else with the right lines of questioning and spyware and UI/UX to elicit it.

IMTDb · 2026-02-05T15:13:39 1770304419

He intentionally disrupted all communications including, but not limited to, emergency ones. In process he tried to unilaterally control a public resource based on his own authoritarian view of what is "good" vs "bad". He is lucky he is not in jail.