Early on, these code agents wouldn't do basic good hygiene things, like check if the code compiled, avoid hallucinating weird modules, writing unit tests. And people would say they sucked ....
But if you just asked them to do those things: "After you write a file lint it and fix issues. After you finish this feature, write unit tests and fix all issues, etc ..."
Well, then they did that, it was great! Later the default prompts of these systems included enough verbiage to do that, you could get lazy again. Plus the models are are being optimized to know to do some of these things, and also avoid some bad code patterns from the start.
But the same applies to performance today. If you ask it to optimize for performance, to use a profiler, to analyze the algorithms and systemically try various optimization approaches ... it will do so, often to very good results.
Yep. Claude code is best thought of as an overachieving junior / mid. It can run off and do all sorts of work on its own, but it doesn't have great instincts and it can't read your mind about what you want.
Use it as if you're the tech lead managing a fresh hire. Tell it clearly what you want it to focus on and you get a much better result.
Except it often is the case that when you break down what humans are doing, there are actual concrete tasks. If you can convert the tacit knowledge to decision trees and background references, you likely can get the AI to perform most non-creative tasks.
I half agree. But two points: 1) if you can formalize your instructions ... then future instances can be fully automated. 2) You are still probably having the AI perform many sub-tasks. AI-skeptics regularly fall into this god-of-the-gaps trap. You aren't wrong that human-augmented AI isn't 100% AI ... but it still is AI-augmentation, and again, that sets the stage for point 1 - to enable later future full automation on long enough timecycles.
Formal instructions paired by tables are almost as rigid as code. Btw normal engineering disciplines have a lot of strict math and formulas. Neither electrical nor mechanical engineering runs on purely instructions.
The non-software engineering disciplines I'm thinking of rely on blueprints, schematics, diagrams, HDLs, and tables much more than human language formal instructions. More so than software engineering.
Disagree, they rely on both equally, not much more on one of them. Consider the process of actually building a large structure with only a set of such diagrams. The diagrams primarily cover nouns (what, where, using what), whereas the human language formal instructions cover the verbs (how, why, when). You can't build anything with only one of the two.
And sure, the human language formal instructions often appear inside tables or diagrams, that doesn't make them anything less so.
This is based on having worked with companies that do projects in the 10 figure range.
Normal isn't a myth. The mistake people make is taking the mode as normal, or worse mistaking their own experience as normal. But humans generally do tend to have a range of common behaviors that a significant percentage of people fit into. And you probably can even predict it to a reasonable degree, if you have some other metadata to correlate which sub-group they might correspond to.
Normal in the sense of "you can model a distribution of human behavioral processes or outcomes" that encompasses, say, 95% of humans in a given culture or geography is very much a thing you can do. And I'd go as far as to say a large chunk of the mental bandwidth of the average person is running those simulation models just to operate in a multi-human-agent world.
(If you want to say we observe bimodal or other multi-peaked distributions in practices rather than "normal" ones, I will strongly agree, but that usually isn't the objection when people say "normal is a myth")
ChatAI - show the top 50 online retailers by revenue in the US and note any that have credible new stories about quality control issues. Save all of them except StoreX and StoreY in your list you use for comparison shopping.
Or maybe another one, scan all my credit card purchases for all time that you have history and record all the stores.
Done. And plenty of third party sites (consumer reports, wirecutter, etc...) will do this kind of thing too. And you could perhaps transitively trust them - either view direct lists or just scraping the places they recommend.
And the average person doesn't need to figure this out ... skills encoding this will propagate.
All the hard work is always chasing down edge cases, scaling, operational issues and other things that don't show up the user-exposed features. And talking about features, the innovation in coming up with them, or iterating on making them work with real customer experience is a ton of value, even if copying the ideas that work later is much easier - which is why I generally prefer betting on an innovator with just of enough traction to show they can stick with it. The best category leaders both innovate and steal/copy/buy all the innovation they aren't producing in house to maintain their lead.
Business crave both data for analysis and checkboxes getting checked for compliance sake. If those don't align to the value of the work - then you have the classic of employees hating the "TPS Reports" they are forced to make. As an example, sales people are notorious for basically never updating CRMs and also they have incentives to skew the specifics anyway.
> this is sub-par and neglects important aspects of your business
But that is exactly the right way to think about it. If you have an army of sub-par workers that aren't going to think deeply about their value to your business, but are really cheap (relative to human labor) - how do you make effective use of them? Thinking about AI agents as being high-competence and able to learn your intent is the wrong model at this point. Though they can be high-competence in very specific narrow niches.
Also consider that while the OP looks like a skilled, experienced individual, all too often the documentation is being written by someone with that context, but rather someone unskilled, and with read empathy. Quality is quite often very poor, to the point where as shitty as genai can be, it is still an improvement. Bad UX and writing outnumbers the good. The successes of big companies and the most well known government services are the exception.
Early on, these code agents wouldn't do basic good hygiene things, like check if the code compiled, avoid hallucinating weird modules, writing unit tests. And people would say they sucked ....
But if you just asked them to do those things: "After you write a file lint it and fix issues. After you finish this feature, write unit tests and fix all issues, etc ..."
Well, then they did that, it was great! Later the default prompts of these systems included enough verbiage to do that, you could get lazy again. Plus the models are are being optimized to know to do some of these things, and also avoid some bad code patterns from the start.
But the same applies to performance today. If you ask it to optimize for performance, to use a profiler, to analyze the algorithms and systemically try various optimization approaches ... it will do so, often to very good results.
reply