Hacker Newsnew | past | comments | ask | show | jobs | submit | redox99's commentslogin

I don't think there's much recursive improvement yet.

I'd say it's a combination of

A) Before, new model releases were mostly a new base model trained from scratch, with more parameters and more tokens. This takes many Months. Now that RL is used so heavily, you can make infinitely many tweaks to the RL setup, and in just a month get a better model using the same base model.

B) There's more compute online

C) Competition is more fierce.


Not really, these subscriptions have a clear and enforced 5h and weekly limit.

Wouldn't everything be on the internet archive? And common crawl?

Being on the internet archive and being able to pick up from a restored backup are two very different things

It's a wiki. Maybe you lose the edit history and stuff like that, but the actual content which is what matters should be very easy to recreate from those sources.

There's more compute now than before.

The agents are not that good yet, but with human supervision they are there already.

I've forked a couple of npm packages, and have agents implement the changes I want plus keep them in sync with upstream. Without agents I wouldn't have done that because it's too much of a hassle.


AGENTS.md is for that global view.

The 'global view' doc should be in DESIGN.md so that humans know to look for it there, and AGENTS.md should point to it. Similar for other concerns. Unless something really is solely of interest to robots, it shoudn't live directly in AGENTS.md AIUI.

You can't possibly cram everything into AGENTS, also LLMs still do not perfectly give the same weight to all of its context, ie. it still ignores instructions.

Am I stupid or do these agents regularly not read what’s in the agents.md file?

More recent models are better at reading and obeying constraints in AGENTS.md/CLAUDE.md.

GPT-5.2-Codex did a bad job of obeying my more detailed AGENTS.md files but GPT-5.3-Codex very evidently follows it well.


Perhaps I’m not using the latest and greatest in terms of models. I tend to avoid using tools that require excessive customization like this.

I find it infinitely frustrating to attempt to make these piece of shit “agents” do basic things like running the unit/integrations tests after making changes.


Opus 4.5 successfully ignored the first line of my CLAUDE.md file last week

Thank god it’s not just me. It really makes me feel insane reading some of the commentary online.

Each agent uses a different file, like claude.md etc (maybe you already knew that).

And it requires a bit of prompt engineering like using caps for some stuff (ALWAYS), etc.


You’re not stupid. But the agents.md file is just an md file at the end of the day.

We’ve been acting as if it’s assembly code that the agents execute without question or confusion, but it’s just some more text.


That’s not what Claude and Codex put there when you ask them to init it. Also, the global view is most definitely bigger than their tiny, loremipsum-on-steroids, context so what do you do then?

You know you can put anything there, not just what they init, right? And you can reference other doc files.

I should probably stop commenting on AI posts because when I try to help others get the most out of agents I usually just get down voted like now. People want to hate on AI, not learn how to use it.


its still not truly global but that seems like a bit pie in the sky.

people still do useful work without a global view, and there's still a human in the loop witth the same ole amount of global view as they ever had.


Gen AI for art was different because it would just output a final image with basically 0 control for the artist. It's like if AI programming would output a binary instead of source code.

A programming language does not need to have a for loop. In fact many don't.

Programming languages need to give the developer a way to iterate (map, fold, for-loop, whatever) over a collection of items. Over time we've come up with more elegant ways of doing this, but as a programmer, until LLMs, you've still had to be actively involved in the control logic. My point is that a developer's relationship with the code is very different now, in a way that wasn't true with previous low-to-high level language climbs.

I was thinking of something like SQL, which is declarative and you tell it what you want, not how to do it broadly speaking.

If you can't deliver features faster with AI assistance then you're either using it wrong or working on very specialized software that AI can't handle yet.

I haven't seen any evidence yet that using AI is improving developer performance, just a bunch of people who "feel" like it does.

I'm still on the fence about codegen but it's certainly helping explain code quickly without manually step through and providing quick access to docs

I've built a SaaS (with paying customers) in a month that would have taken me easily 6 months to build with this level of quality and features. AI wrote I'd say 99.9% of code. Without AI I wouldn't even have done this because it would have been too large of a task.

In addition, for my old product which is 5+ years old, AI now writes 95%+ of code for me. Now the programming itself takes a small percentage of my time, freeing me time for other tasks.


No-one serious is claiming 6x productivity improvements for close to equal quality

This is proving GP's point that you're going off feels and/or exaggerating


Quality is better both from a user and a code perspective.

From a user perspective I often implement a feature and then just throw it away no worries because I can reimplement it in an hour again based on my findings. No sunken cost. Also I can implement very small details that otherwise I'd have to backlog. This leads to a higher quality product for the user.

From a code standpoint I frequently do large refactors that also would never have been worth it by hand. I have a level of test coverage that would be infeasible for a one man show.


> I have a level of test coverage that would be infeasible for a one man show.

When a metric becomes a target, it ceases to be a good metric.


Cool. What's the product? Like, do you have a link to it or something.

It's boring glorified CRUD for SMBs of a certain industry focused on compliance and workflows specific to my country. Think your typical inventory, ticketing, CRM + industry specific features.

Boring stuff from a programming standpoint but stuff that helps businesses so they pay for it.


Okay, but where's the product? You described the product, but didn't share it.

NYC people uses it because the alternatives are either slower or much more expensive. I'm sure they'd rather use a waymo if it was as fast and cheap as the subway.


Using Lyft, Uber, or Waymo in San Francisco is slow, especially during peak times. To go across town in NYC by train, it would take 5-10 times as long to go that same distance in SF by car. If you have to cross a bridge or tunnel, it's going to be even longer during peak times.

That's the whole problem. Car transportation simply doesn't scale, so there will never be an option to use waymo that's as fast and cheap as the subway. It's worth calling out that an efficient train system is vital to keeping car traffic moving quickly, because once everyone is in a car, it's gridlock.


I think the point is doubting whether it is ever possible for Waymo to ever be as fast or cheap as public transport in NYC.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: