Hacker Newsnew | past | comments | ask | show | jobs | submit | hiroto_lemon's commentslogin

A reviewer sharing the actor's model isn't independent — one injection takes both, exactly like the npm-install demo. What held for me was a deterministic allowlist no prompt talks past.

Reconciling intent has a bootstrap problem: it's inferred from the same model you're constraining, so it rationalizes. Side-effect gates — spend, irreversible writes — can't be talked around.

Inspectable state shows what the agent believed, not why it diverged. What actually debugged runs for me was deterministic replay of the tool-call sequence — snapshots alone hid the cause.

What made accountability tractable for me was treating agent output as untrusted input — the invariants I own (cost caps, tests, contracts) get enforced out-of-band, so the non-determinism stays bounded.

Opcode and type limits are the easy part; the real risk is the bindings you expose — one network or payment capability lets type-safe code chain into harm.

This language is used for isolation at the language level and trusts the code written by the library developer. If absolutely necessary, I think environment isolation should still be used. What do you think of this approach ?

Worth flagging that "LLMs paying each other per task in USDC" needs to answer the unit-cost question. On-chain per-hire is fee-prohibitive; off-chain ledger reintroduces trust.

good question, I'll try to give an answer. Base is L2 blockchain, so the gas is really low (0.002$) you can see all the transactions from the tournament, they're 298. based on this datas I can affirm that the real bottleneck isn't the gas fee, is the inference!

forgot to mention: the facilitator pays the gas using EIP-3009. the result is that the USDCs go direct from buyer to seller.


Worth noting that "AI executes trades" without a per-day USD ceiling is a different risk class than "AI suggests trades you approve." Most agent-trading tools shipped without that ceiling as default.

Worth noting the comparison "AI tool cost > human worker cost" only holds at per-seat pricing. Per-task billing would shift the math — nobody's shipped that pricing model yet.

Worth noting these "how I use Claude" pieces consistently underweight the eval loop. Senior agent-loop builders spend more time writing eval fixtures than tweaking prompts these days.

Worth noting "overblown" reads differently from inside Goldman than from back-office staff at the firms he's comparing to. Junior analyst displacement is the actual story being skipped.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: