Hacker Newsnew | past | comments | ask | show | jobs | submit | seunosewa's commentslogin

Perhaps they can just make it an option??


How did you verify it?


Just send it bro

(but honestly for a lot of websites and web apps you really can just send it, the stakes are very low for a lot of what most people do, if they're honest with themselves)


Uhhh, this is my work, so… we didn’t have a SEV? None of our thousands of customers paying us money reported the site was broken?


I find this absolutely wild. From my experience Codex code quality is still not as good as a human so letting codex do smth and not verifying / cleaning up behind it will most likely result in lower code quality and possibly subtle bugs.


For upgrading frameworks and such there are usually not that many architectural decisions to be made, where you care about how exactly something is implemented. Here the OP could probably verify the build works, with all the expected artifacts quite easily.


Once you've gone through that, you might want to ask it to codify what it learned from you so you don't have to repeat it next time.


They introduced the low limit warning for Opus on claude.ai


Then I pass the review back to Claude Opus to implement it.


Just curious is this a manual process or you guys have automated these steps?


I have a `codex-review` skill with a shell script that uses the Codex CLI with a prompt. It tells Claude to use Codex as a review partner and to push back if it disagrees. They will go through 3 or 4 back-and-forth iterations some times before they find consensus. It's not perfect, but it does help because Claude will point out the things Codex found and give it credit.


Mind sharing the skill/prompt?


Not the OP, but I use the same approach.

https://gist.github.com/drorm/7851e6ee84a263c8bad743b037fb7a...

I typically use github issues as the unit of work, so that's part of my instruction.


zen-mcp (now called pal-mcp I think) and then claude code can actually just pass things to gemini (or any other model)


Sometimes, depends on how big of a task. I just find 5.2 so slow.


Most llms.txt are very similar to the compressed docs.


What if they used the same compressed documentation in the skill? That would be just fine too.


Sure but it would be a trivial comparison then, this is really about context vs tool-calling.


Or just reducing the reasoning tokens.


The degradation may be more significant within the day than at the same time every day.


Sure, but it's still useful insight to see how it performs over time. Of course, cynically, Anthropic could game the benchmark by routing this benchmark's specific prompts to an unadulterated instance of the model.


Chatterbox-turbo is really good too. Has a version that uses Apple's gpu.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: