GPT-5 isn’t really a brand-new model in the way people think. From what I’ve seen, the goal was more about reducing costs and unifying the interface than releasing a totally different architecture. Under the hood it is still routing to models we already know, just picking what it thinks will give the “best” result for the request.
That can be fine for a lot of general use cases, but if you’re working in specific domains like coding agents or high-precision summarization, that routing can actually make results worse compared to sticking with a model you know performs well for your workload.
Thats not what OpenAI are claiming. They are claiming that there are two new flagship models and a router that routes between them.
"GPT‑5 is a unified system with a smart, efficient model that answers most questions, a deeper reasoning model (GPT‑5 thinking) for harder problems, and a real‑time router that quickly decides which to use"
When I first started trying to build games with LLMs I did what almost everyone does at the beginning. I went straight for a one-shot prompt to make a complete game. Just like many others, I expected the model to almost read my mind. I imagined it would somehow capture all the little connections and decisions happening in my head in a fraction of a second just from a few lines of text. Of course, that is not how they work.
After some time I began to understand the mechanics of how LLMs operate on a deeper level. That naturally led to the now fading term “prompt engineering”. These days people talk more about “context engineering” but the core idea is the same. We have to teach our own brain how the LLM works so we can structure the context in a way that lets it deliver maximum value.
With my current work on GameByte, where AI builds studio-quality mobile games from prompts, this understanding has become crucial. When you explain the problem in a way that matches how the model processes information, even something as short as “3D platformer game” in the system prompt can be enough. The model will then ask the right follow-up questions and move you toward your goal without constant manual steering.
Another lesson is that all the old pain points developers faced before AI are still pain points for LLMs. Spaghetti code, excessively long files, poor documentation, lack of comments and missing test cases all reduce their effectiveness. This is why Amazon’s recent “Kiro” and the spec-driven development approach resonate so well. They are basically formalizing best practices that those of us building with LLMs have learned over time.
And finally, LLMs do not particularly enjoy editing someone else’s messy code. Just like human developers, they perform much better when writing from scratch. If you clearly define the boundaries of the task and ask them to start fresh, the results are often significantly better.
From my experience, the tool choice is actually secondary. What really drives success is:
1. How complex the tasks are
2. How clearly and precisely you prompt for each one
3. How sprawling and interdependent your codebase is
4. The underlying model you’re running on (GPT-5, Sonnet-4, etc.)
If those factors aren’t set up well, you’ll hit walls no matter which agent you pick. I’ve seen teams switch tools thinking “this one will finally work” only to hit the exact same issues. Because the bottleneck was their workflow, not the AI’s raw ability.
Once you tune the prompts, scope tasks properly, and feed the model the right context, most modern coding agents perform surprisingly well. That’s why in our platform, some dev teams can ship entire game prototypes or complex features with LLMs, while others struggle to get a passing unit test out of the same tool.
Feels like you might be overthinking this. Most of the tools you listed already cover 90% of what you’re asking for. If you’re starting a SaaS from scratch and haven’t got a single user yet, sinking too much time into picking the “perfect” auth solution will slow you down more than making a “wrong” choice ever will. The bigger risk is not shipping at all.
I’ve worked with Supabase, Clerk, Keycloak, and Kratos on different projects. None of the open-source options truly deliver on “low management overhead”. You’ll always have to deal with updates, patches, and some manual babysitting.
If you refuse to compromise on your feature list, your realistic options shrink fast. In that case, Zitadel is a solid choice, but be ready for higher costs from day one. My advice is to trim the must-have list, go with a managed service, get real users, and revisit the decision when scale actually becomes a problem.
That can be fine for a lot of general use cases, but if you’re working in specific domains like coding agents or high-precision summarization, that routing can actually make results worse compared to sticking with a model you know performs well for your workload.