More

seunosewa · 2026-02-16T13:05:07 1771247107

Perhaps they can just make it an option??

seunosewa · 2026-02-12T20:16:54 1770927414

How did you verify it?

girvo · 2026-02-12T21:45:16 1770932716

Just send it bro

(but honestly for a lot of websites and web apps you really can just send it, the stakes are very low for a lot of what most people do, if they're honest with themselves)

johnfn · 2026-02-13T02:01:51 1770948111

Uhhh, this is my work, so… we didn’t have a SEV? None of our thousands of customers paying us money reported the site was broken?

ghosty141 · 2026-02-13T10:07:36 1770977256

I find this absolutely wild. From my experience Codex code quality is still not as good as a human so letting codex do smth and not verifying / cleaning up behind it will most likely result in lower code quality and possibly subtle bugs.

tinodb · 2026-02-14T12:37:11 1771072631

For upgrading frameworks and such there are usually not that many architectural decisions to be made, where you care about how exactly something is implemented. Here the OP could probably verify the build works, with all the expected artifacts quite easily.

seunosewa · 2026-02-09T17:33:53 1770658433

Once you've gone through that, you might want to ask it to codify what it learned from you so you don't have to repeat it next time.

seunosewa · 2026-02-05T22:38:25 1770331105

They introduced the low limit warning for Opus on claude.ai

seunosewa · 2026-02-05T19:28:50 1770319730

Then I pass the review back to Claude Opus to implement it.

VladVladikoff · 2026-02-05T19:42:17 1770320537

Just curious is this a manual process or you guys have automated these steps?

ricketycricket · 2026-02-05T20:36:03 1770323763

I have a `codex-review` skill with a shell script that uses the Codex CLI with a prompt. It tells Claude to use Codex as a review partner and to push back if it disagrees. They will go through 3 or 4 back-and-forth iterations some times before they find consensus. It's not perfect, but it does help because Claude will point out the things Codex found and give it credit.

bryanlarsen · 2026-02-05T22:01:08 1770328868

Mind sharing the skill/prompt?

dror · 2026-02-06T00:33:41 1770338021

Not the OP, but I use the same approach.

https://gist.github.com/drorm/7851e6ee84a263c8bad743b037fb7a...

I typically use github issues as the unit of work, so that's part of my instruction.

_zoltan_ · 2026-02-05T21:26:12 1770326772

zen-mcp (now called pal-mcp I think) and then claude code can actually just pass things to gemini (or any other model)

kilroy123 · 2026-02-05T22:21:00 1770330060

Sometimes, depends on how big of a task. I just find 5.2 so slow.

seunosewa · 2026-01-30T18:54:57 1769799297

Most llms.txt are very similar to the compressed docs.

seunosewa · 2026-01-30T11:24:13 1769772253

What if they used the same compressed documentation in the skill? That would be just fine too.

OJFord · 2026-01-30T12:31:53 1769776313

Sure but it would be a trivial comparison then, this is really about context vs tool-calling.

seunosewa · 2026-01-29T17:45:00 1769708700

Or just reducing the reasoning tokens.

seunosewa · 2026-01-29T16:22:44 1769703764

The degradation may be more significant within the day than at the same time every day.

GoatInGrey · 2026-01-29T16:53:38 1769705618

Sure, but it's still useful insight to see how it performs over time. Of course, cynically, Anthropic could game the benchmark by routing this benchmark's specific prompts to an unadulterated instance of the model.

seunosewa · 2026-01-16T01:35:39 1768527339

Chatterbox-turbo is really good too. Has a version that uses Apple's gpu.