More

jacobr1 · 2026-02-23T23:31:37 1771889497

There is a lot to be done with good prompting.

Early on, these code agents wouldn't do basic good hygiene things, like check if the code compiled, avoid hallucinating weird modules, writing unit tests. And people would say they sucked ....

But if you just asked them to do those things: "After you write a file lint it and fix issues. After you finish this feature, write unit tests and fix all issues, etc ..."

Well, then they did that, it was great! Later the default prompts of these systems included enough verbiage to do that, you could get lazy again. Plus the models are are being optimized to know to do some of these things, and also avoid some bad code patterns from the start.

But the same applies to performance today. If you ask it to optimize for performance, to use a profiler, to analyze the algorithms and systemically try various optimization approaches ... it will do so, often to very good results.

josephg · 2026-02-24T00:38:55 1771893535

Yep. Claude code is best thought of as an overachieving junior / mid. It can run off and do all sorts of work on its own, but it doesn't have great instincts and it can't read your mind about what you want.

Use it as if you're the tech lead managing a fresh hire. Tell it clearly what you want it to focus on and you get a much better result.

srcreigh · 2026-02-24T02:36:13 1771900573

That's a given, and agents still come out 2-10x slower in my experience.

jacobr1 · 2026-02-23T23:14:04 1771888444

Except it often is the case that when you break down what humans are doing, there are actual concrete tasks. If you can convert the tacit knowledge to decision trees and background references, you likely can get the AI to perform most non-creative tasks.

slopinthebag · 2026-02-23T23:34:03 1771889643

If you have to hold the LLM's hand to accomplish a task, using human intelligence to do so, you can't consider the task performed by AI.

jacobr1 · 2026-02-23T23:41:33 1771890093

I half agree. But two points: 1) if you can formalize your instructions ... then future instances can be fully automated. 2) You are still probably having the AI perform many sub-tasks. AI-skeptics regularly fall into this god-of-the-gaps trap. You aren't wrong that human-augmented AI isn't 100% AI ... but it still is AI-augmentation, and again, that sets the stage for point 1 - to enable later future full automation on long enough timecycles.

skydhash · 2026-02-24T00:57:18 1771894638

> if you can formalize your instructions

Isn't that...code?

deaux · 2026-02-24T01:40:29 1771897229

No. Think of all engineering disciplines that aren't software. Those all depend on human-language formal instructions.

okanat · 2026-02-24T02:14:22 1771899262

Formal instructions paired by tables are almost as rigid as code. Btw normal engineering disciplines have a lot of strict math and formulas. Neither electrical nor mechanical engineering runs on purely instructions.

ThrowawayR2 · 2026-02-24T03:38:19 1771904299

The non-software engineering disciplines I'm thinking of rely on blueprints, schematics, diagrams, HDLs, and tables much more than human language formal instructions. More so than software engineering.

deaux · 2026-02-24T05:14:04 1771910044

Disagree, they rely on both equally, not much more on one of them. Consider the process of actually building a large structure with only a set of such diagrams. The diagrams primarily cover nouns (what, where, using what), whereas the human language formal instructions cover the verbs (how, why, when). You can't build anything with only one of the two.

And sure, the human language formal instructions often appear inside tables or diagrams, that doesn't make them anything less so.

This is based on having worked with companies that do projects in the 10 figure range.

jacobr1 · 2026-02-23T18:19:55 1771870795

Normal isn't a myth. The mistake people make is taking the mode as normal, or worse mistaking their own experience as normal. But humans generally do tend to have a range of common behaviors that a significant percentage of people fit into. And you probably can even predict it to a reasonable degree, if you have some other metadata to correlate which sub-group they might correspond to.

Normal in the sense of "you can model a distribution of human behavioral processes or outcomes" that encompasses, say, 95% of humans in a given culture or geography is very much a thing you can do. And I'd go as far as to say a large chunk of the mental bandwidth of the average person is running those simulation models just to operate in a multi-human-agent world.

(If you want to say we observe bimodal or other multi-peaked distributions in practices rather than "normal" ones, I will strongly agree, but that usually isn't the objection when people say "normal is a myth")

jahsome · 2026-02-23T20:55:54 1771880154

Fair enough, perhaps I could have said "normal is relative"

A behavior may be typical, or common maybe, but I think "normal" evokes certain connotations when describing human behavior.

jacobr1 · 2026-02-23T18:10:04 1771870204

Yep.

ChatAI - show the top 50 online retailers by revenue in the US and note any that have credible new stories about quality control issues. Save all of them except StoreX and StoreY in your list you use for comparison shopping.

Or maybe another one, scan all my credit card purchases for all time that you have history and record all the stores.

Done. And plenty of third party sites (consumer reports, wirecutter, etc...) will do this kind of thing too. And you could perhaps transitively trust them - either view direct lists or just scraping the places they recommend.

And the average person doesn't need to figure this out ... skills encoding this will propagate.

jacobr1 · 2026-01-29T20:30:34 1769718634

Unless they have agents reading those emails and responding ...

Imustaskforhelp · 2026-01-29T21:18:12 1769721492

Oh I feel like this is already in the making.

Let me create another (Y-combinator backed) startup which will intend on solving this issue haha (/s just kidding)

Haakam21 · 2026-01-29T22:18:08 1769725088

This is already happening. Also with AgentMail.

calvinmorrison · 2026-01-30T00:46:21 1769733981

wait till you find about B2B procurement marketplaces... ya'll this stuff exists

jacobr1 · 2026-01-29T20:27:36 1769718456

All the hard work is always chasing down edge cases, scaling, operational issues and other things that don't show up the user-exposed features. And talking about features, the innovation in coming up with them, or iterating on making them work with real customer experience is a ton of value, even if copying the ideas that work later is much easier - which is why I generally prefer betting on an innovator with just of enough traction to show they can stick with it. The best category leaders both innovate and steal/copy/buy all the innovation they aren't producing in house to maintain their lead.

calvinmorrison · 2026-01-30T00:45:32 1769733932

and a lot of those features only really matter if you are serving a lot of customers. PHP is just fine if your serving 10-20 internal customers.

jacobr1 · 2026-01-29T17:04:12 1769706252

Business crave both data for analysis and checkboxes getting checked for compliance sake. If those don't align to the value of the work - then you have the classic of employees hating the "TPS Reports" they are forced to make. As an example, sales people are notorious for basically never updating CRMs and also they have incentives to skew the specifics anyway.

jacobr1 · 2026-01-29T17:01:28 1769706088

> this is sub-par and neglects important aspects of your business

But that is exactly the right way to think about it. If you have an army of sub-par workers that aren't going to think deeply about their value to your business, but are really cheap (relative to human labor) - how do you make effective use of them? Thinking about AI agents as being high-competence and able to learn your intent is the wrong model at this point. Though they can be high-competence in very specific narrow niches.

samrus · 2026-01-30T03:49:52 1769744992

This specifically, and llms in general, remind me how apt it is that robot is the slavic word for slave

jacobr1 · 2026-01-15T18:11:31 1768500691

Also consider that while the OP looks like a skilled, experienced individual, all too often the documentation is being written by someone with that context, but rather someone unskilled, and with read empathy. Quality is quite often very poor, to the point where as shitty as genai can be, it is still an improvement. Bad UX and writing outnumbers the good. The successes of big companies and the most well known government services are the exception.

jacobr1 · 2026-01-06T14:57:50 1767711470

The common thread is that people are bad and saving and delayed gratification. The easiest path to instant gratification wins more often than not.