I’d install it via cargo anyway and that would build it for arm64.
If the arm64 version was on homebrew (didn’t check if it is but assume not because it’s not mentioned on the page), I’d install it from there rather than from cargo.
I don’t really manually install binaries from GitHub, but it’s nice that the author provides binaries for several platforms for people that do like to install it that way.
Really? That is your response? This is an high quality article from someone who spend a lot of time implementing a cool tool and also sharing the intricate inner workings of it. And your response is, "eh there are no official binaries for my platform". Give them some credit! Be a little more constructive!
> And on top of it, if you develop for native macOS, There’s no official tooling for visual verification. It’s like 95% of development is web and LLM providers care only about that.
Thinking out loud here, but you could make an application that's always running, always has screen sharing permissions, then exposes a lightweight HTTP endpoint on 127.0.0.1 that when read from, gives the latest frame to your agent as a PNG file.
Edit: Hmm, not sure that'd be sufficient, since you'd want to click-around as well.
Maybe a full-on macOS accessibility MCP server? Somebody should build that!
To be able to use ffmpeg with its native network capabilities in a usecase of media servers, where you need to stream your input to it, and then get multiple outputs (think HLS) that are streamed back is not possible at this point in time. HTTP, FTP, SFTP, all have their limitations, some are outright broken for HLS usecases, others wont stream seeking.
I would have very much loved to use the built-in capabilities instead of patching ffmpeg to add a vfs layer and spend a ton of time figuring out the build pipeline once you add all the codecs and hwaccels. I do hope to be able to change this in the future, I've identified several bugs that I intend to submit patches for.
This is not a special case. Everything you mentioned above can actually be achieved using cli. You can create listeners, configure pipelines, and sinks(granted not ergonomic). Sinks can be HTTP post for example, and sources can be tcp listeners + protocols on top. You can also configure the buffering strategies for each pipeline.
- Client makes request to server (which opens a bidirectional network socket)
- Server uses that bidirectional socket, spawns a local patched ffmpeg with vfs-like characteristics
- ffmpeg (using client-server bidrection socket) does input/output operations, treating client filesystem as if it was local
Thus client doesn't need to open any ports, or expose its filesystem in a traditional mounting manner, and one server can handle filesystems & requests of any amount of clients.
I've explored sftp since ffmpeg has built-in support for it (-i sftp://...), but the support is quite buggy in code, I hope to submit some patches upstream to be able to change it. FTP in contrast seemed much more stable, at least looking at the code. FTP had some other shortcomings that made it undesirable for my usecase.
That was the one motivation, the other one was that it would require rewriting arguments going into the server. What you're describing was essentially what ffmpeg-over-ip v4 (and its earlier versions!) was, and the constant feedback I'd heard was that sharing filesystems is too much work, ssh servers on windows and macOS are a bad experience, people want to use a bundled solution.
Forking ffmpeg was no easy task! Took forever to figure out the build process, eventually caved in and started using jellyfin build scripts, but that has the downside of being a few versions behind of upstream HEAD.
Sharing filesystems is hard when you make users do it in advance.
I was thinking of the server end of an ffmpeg-over-ip system bringing up a FUSE filesystem backed by something similar to your VFS-served-by-the-client. Combine that either with argument rewriting, or chrooting into the FUSE filesystem.
As another commenter said, where's plan 9 when you need it? If you go the FUSE route there are existing 9P implementations for both server and FUSE client you can use.
The usecase for something like this is when you control both sides, server & client. There is some basic HMAC auth built into each request.
> I would recommend to sandbox if at all possible.
Since the server is a standard binary that doesn't need any special permissions, you could create the most locked down user in your server that only has access to a limit set of files and the GPUs and it'll work just fine. This is encouraged.
I am thinking of adding a Windows application with an installer and a tray icon that you can use for some basic settings like changing port or password, or toggling automatic startup.
For linux, I am thinking of adding convenience helpers around systemd service installation
Very cool. Peertube supports remote runners [1] [2], might take a look for inspiration. As a distributed compute enthusiast, big fan of of this model for media processing.
reply