More

carderne · 2026-03-09T08:13:59 1773044039

How do agents tend to deal with getting blocked? Messing around with sandboxes, I've quite even seen them get blocked, assume something is wrong, and go _crazy_ trying to get around the block, never stopping to ask for user input. It might be good to add to the error message: "This is deliberate, don't try to get around it."

For those using pi, I've built something similar[1] that works on macOS+Linux, using sandbox-exec/bubblewrap. Only benefit over OP is that there's some UX for temporarilily/permanently bypassing blocks.

[1] https://github.com/carderne/pi-sandbox

e1g · 2026-03-09T08:32:00 1773045120

Claude Code and Codex quickly figure out they are inside sandbox-exec environment. Maybe because they know it internally. Other agents often realize they are being blocked, and I haven't seen them go haywire yet.

Big love for Pi - it was the first integration I added to Safehouse. I wanted something that offers strong guarantees across all agents (I test and write them nonstop), has no dependencies (e.g., the Node runtime), and is easy to customize, so I didn't use the Anthropic sandbox-runtime.

carderne · 2026-03-09T08:38:40 1773045520

Interesting, that's not been my experience! Maybe you've got the list of things to allow/block just right. While testing different policies I've frequently seen Opus 4.6 go absolutely nuts trying to get past a block, unless I made it more clear what was happening.

Yeah I think for general use the transparency of what your thing does is really great compared to a pile of TypeScript and whatnot.

gbrindisi · 2026-03-09T09:04:31 1773047071

ah I also did my own sandbox and at least twice the agent inside tried really hard to go around the firewall, so I ended up intercepting calls to `connect` to return a message that says "Connection refused by the sandbox, don't try to bypass".

Code here: https://github.com/gbrindisi/agentbox

carderne · 2026-03-09T08:10:18 1773043818

There is sandbox-runtime [1] from Anthropic that uses bubblewrap to sandbox on Linux (and works the same as OP on macOS). You can look at the code to see how it uses it. Anthropic's tool only support read blacklist, not a whitelist, so I forked it yesterday to support that [2].

[1] https://github.com/anthropic-experimental/sandbox-runtime [2] https://github.com/carderne/sandbox-runtime

carderne · 2026-03-06T18:46:23 1772822783

How does this work with self-hosting? Is the assumption that self-hosters won’t run into this problem?

For most use-cases I’d probably prefer to just delete the payloads some time after the job completes (persisting that data is business logic problem). And keep the benefits of “just use Postgres”, which you guys seem to have outgrown.

abelanger · 2026-03-06T19:17:21 1772824641

Candidly we're still trying to figure that out: all of the plumbing is there in the open source, but the actual implementation of writes to S3 are only on the cloud version. This is partially because we're loath to introduce additional dependencies, and partially because this job requires a decent amount of CPU and memory and would have to run separate from the Hatchet engine, which adds complexity to self-hosted setups. That said, we're aware of multi-TB self-hosted instances, and this would be really useful for them - so it's important that we can get this into the open source.

The payloads are time-partitioned (in either case) so we do drop them after the user-defined retention period.

carderne · 2026-03-07T07:21:53 1772868113

I guess you don't get the luxury of being opinionated enough to say: forget your old data.

Anyway great write-up, even though I'm sure it's painful having to run this system on top of your once-elegant Postgres solution.

carderne · 2026-02-25T11:45:19 1772019919

I got pi to write me a very basic sandbox based on an example from the pi github. Added hooks for read/write/edit/bash, some prompts to temp/perm override. Have a look, copy-paste what you like.

https://github.com/carderne/pi-sandbox

carderne · 2026-02-25T11:16:38 1772018198

The people pushing oh-my-pi seem to have missed the point of pi... Downloading 200k+ lines of additional code seems completely against the philosophy of building up your harness, letting your agent self-improve, relying on code that you control.

If you want bags of features, rather clone oh-my-pi somewhere, and get your agent to bring in bits of it a time, checking, reviewing, customising as you go.

manojlds · 2026-02-25T11:32:54 1772019174

Yeah ohmypi is garbage. The point is you have a thing shell and add your own on top by just talking to pi itself or pick in selective extensions.

carderne · 2026-02-23T14:45:53 1771857953

I'd say it's the idea/fact/feeling that, in 2026, agency matters more than skill/wisdom/intelligence.

Long read on the topic (quite funny, covers Cluely): https://harpers.org/archive/2026/03/childs-play-sam-kriss-ai...

clouedoc · 2026-02-23T16:55:35 1771865735

Probably, Roy was born agentic as a part of a package which included an disregard for intellectual growth.

This doesn't mean that being agentic cannot be cultivated by regular people.

In 2026, yes, agency matters more than skill/wisdom/intelligence to get VC funds. But what's the point of agency alone if you are leading such a life?

What gives me hope is that in 2026, skillful people can delegate a lot of their work to LLMs, which gives them time to learn the "agentic" part which is basically marketing and talking with people.

(just thinking out loud)

carderne · 2026-02-19T07:30:49 1771486249

It means the marginal cost to sell another subscription is lower than what they sell it for. I don't know if that's true, but it seems plausible.

carderne · 2026-02-10T17:37:24 1770745044

This is super interesting framing. I’m definitely a completer, not that I like much about Slack. Probably useful to have this kind of discussion before/while making knowledge management decisions in startups.

carderne · 2026-01-31T16:27:44 1769876864

Out of interest, what control plane do you use for a Hetzner/metal setup? Kubernetes ecosystem?

I use Coolify for side projects, haven’t investigated whether I’d want to use it for bigger/importanter stuff.

bborud · 2026-02-04T19:07:33 1770232053

A surprising number of solutions can be realized in ways that don't actually need much of a control plane if you introduce a few design constraints.

But if you do need one, I guess Kubernetes is perhaps the safe bet. Not so much because I think it is better/worse than anything else, but because you can easily find people who know it and it has a big ecosystem around it. I'd probably recommend Kubernetes if I were forced to make a general recommendation.

That being said, this has been something that I've been playing with a bit over the years. I've been exploring both ends of the spectrum. What I realized is that we tend to waste a lot of time on this with very little to show for it in terms of improved service reliability.

On one extreme we built a system that has most of the control plane as a layer in the server application. Then external to that we monitored performance and essentially had one lever: add or remove capacity. The coordination layer in the service figured out what to do with additional resources. Or how to deal with resources disappearing. There was only one binary and the service would configure itself to take on one of several roles as needed. All the way down to all of the roles if you are the last process running. (Almost nobody cares about the ability to scale all the way down, but it is nice when you can demo your entire system on a portable rack of RPis - and then just turn them off one by one without the service going down)

On the other extreme is having a critical look at what you really need and realize that if the worst case means a couple of hours of downtime a couple of times per year, you can make do with very little. Just systemd deb packages and SSH access is sufficient for an awful lot of more forgiving cases.

I also dabbled a bit in running systems by having a smallish piece of Go code remote-manage a bunch of servers running Docker. People tend to laugh about this, but it was easy to set up, it is easy to understand and it took care of everything that the service needed. The kubernetes setup that replaced it has had 4-5 times the amount of downtime. But to be fair, the person who took over the project went a bit overboard and probably wasn't the best qualified to manage kubernetes to begin with.

It seems silly to not take advantage of Docker having an API that works perfectly well. (I'd research Podman if I were to do this again).

I don't understand why more people don't try the simple stuff first when the demands they have to meet easily allow for it.

carderne · 2026-01-17T18:44:21 1768675461

I did something similar once for a mining technique called “core logging”. It’s a single photo about 1000 pixels wide and several million “deep”: what the earth looks like for a few km down.

Existing solutions are all complicated and clunky, I put something together with S3 and bastardised CoGeoTIFF, instant view of any part of the image.

Wish I knew how to commercialise it…

kirubakaran · 2026-01-18T03:12:33 1768705953

Of course you could commercialize it!

You've already done the "building v1" part, and have started to do the "talking about it" part.

Next step is to write up how one could use it, how it is better than the alternatives, and put it up on a website.

I'm happy to chat about it if you like. My email is in my profile.

Once you have real users, they will pull the v2 out of you, and that will be what you'll sell.

What I've written above sounds like a business proposition, but I want to clarify that I'm just offering to share what I know for free :-)

el_pa_b · 2026-01-17T19:05:32 1768676732

I'm curious about the "core logging" photo. Where can I find one? Do you have an implementation of your solution? I would be curious to have a look at it.

carderne · 2026-01-19T15:57:21 1768838241

I wasn't able to find any imagery online, and I don't have anything I can share publicly.

These are some of the existing commercial solutions (just found these on Google, can't remember which I was comparing my own work against):

- https://koregeosystems.com/digital-core-logging/

- https://mountsopris.com/wellcad/core-logging-software/

- https://www.geologicai.com/logging/

I don't know enough about the science side to take it any further on my own.

The "tech" part of what I started building is really quite simple: convert the images to Cloud-optimised GeoTIFF, then do range requests to S3 from the browser.

carderne · 2026-01-18T10:14:43 1768731283

Might not be possible to find any, they’re expensive and niche. If you reach out (email in profile) I can show/share how it works (nothing currently public).

czbond · 2026-01-17T19:33:15 1768678395

@carderne I think el_pa_b has an idea on how to commercialize it.

In all seriousness, how is it not useful for gold mining or phracking?