The main thing that was dumbing me down (and burning me out) was having to babysit LLMs on anything except basic tasks if I care about code quality/structure/maintainability.
I love coding, it always felt like Legos for adults. Not that Legos aren't also Legos for adults.
But there's no fighting the fact that we won't be writing 99% of the code anymore so I take pleasure in crafting the specs and requirements clearly, that's where I put the effort.
And then to avoid having to babysit the agents to get them to stick to the plan, I built a super robust external orchestrator that forces multiple review and fix rounds until I get the result I want.
It's a durable orchestration system for AI code generation which solves the problem of not being able to trust LLMs to complete long running (and high quality) implementations without having to babysit them and monitor the process, which is what I think is the most exhausting part of coding with AI.
You start with a spec or programmatic task list and the engine runs the whole workflow: implementation, verification, review, fixes, and finalization.
It treats agentic coding like a durable CI-style process, with state, retries, reviewer feedback, commits, and auditability built in. It's externally orchestrated, meaning it's not the agent running the loop, it's simply agents being used as tools and spawned in the loop as needed without awareness of the loop itself.
It's going to be open sourced soon and it's not meant to replace your IDE or Agentic Harness of choice. You keep using codex/claude code/open code/cursor/pi whatever you want and simply delegate the actual implementation to the engine, through MCP/CLI and other integration points.
It supports any LLM provider so you can have GPT 5.5 implementing and a mix of Opus 4.7 / Deepseek v4 Pro / GPT 5.5 reviewing at every phase for example.
Sounds great, matches my philosophies and approaches I've been wanting to follow. Signed up, gave you a follow on twitter and am curious about the open source angle!
I've got a lot to say on the topic but I'll be making a video for launch showcasing everything.
For the open source angle, I think it's just a net positive for more people to have access to a way to build with LLMs without being exposed to AI related burnout.
And for open-source projects using it, the engine can act as a quality gate for PRs by requiring contributors to go through a repo defined implementation and review process.
Because I don't trust LLMs to fully implement what I want on the first try unless I babysit them. And it's the hand holding that burns people out.
I use detailed specs to implement but I don't maintain those specs as the source of truth afterwards, the code is indeed the source of truth.
I've built a library (and products on top) that takes in requirements (programmatic or various spec formats) and forces an externally orchestrated implement -> review -> fix loop that doesn't stop until all requirements are met.
So I'll write a detailed spec then I'll have GPT 5.5 implementing and a mix of opus 4.7 / GPT 5.5 / DeepSeek v4 pro reviewing at every phase until it produces the quality I want.
I can let it run overnight or just during the day while I'm doing stuff that doesn't burn me out and that I actually enjoy.
So tldr spec first for me but not as the source of truth afterwards.
I tweeted about some implementation and review runs that used V4 Pro.
Even without the currently discounted pricing, the value is incredible.
It takes about twice as long to finish code reviews given an identical context compared to opus 4.7/gpt 5.5 but at 1/10 the cost of less, there's just no comparison.
I love coding, it always felt like Legos for adults. Not that Legos aren't also Legos for adults.
But there's no fighting the fact that we won't be writing 99% of the code anymore so I take pleasure in crafting the specs and requirements clearly, that's where I put the effort.
And then to avoid having to babysit the agents to get them to stick to the plan, I built a super robust external orchestrator that forces multiple review and fix rounds until I get the result I want.
I'll be fully open sourcing that soon also https://engine.build
reply