I went through the setup process for Openclaw. Near the end I felt like I had wrestled more with setting it up than I would have had to if I had just built it from the ground up. So I pointed Pi at Nanoclaw and asked it to review it and build me a minimal clone. It took a few minutes and I had the core of something that is easier to maintain (for me) than some unknown large and cumbersome system, or whatever Openclaw is.
.env files or injecting secrets at startup via a secret manager still risks leaking keys.
I vaguely recall an implementation that substitutes secret placeholders with real secrets only during outgoing calls to approved domains which sounds better. However, you're still trusting an agent on your machine with command execution.
I’m kind of curious what you do with it. I feel like the real value is integrating it with everything but then even if it’s nanoclaw or simpler majority of the worthy things are on the unsafe side.
Would love to hear your experience as I’m planning to do the exactly same.
The most interesting thing for me is that I built an extension for Pi that has it recognize when it does not know how to do something I am asking and it then attempts to make its own extension and/or skill to enable whatever that functionality is. Best example there is I just told it to make a todo list for me, and so it made a skill that uses a local file to track todos and follow up on them. I instructed it to make an LLM call to find the best suggested follow up timing to remind me.
So... the real value so far is I find it fun? It isn't the "life changing need to go make a tweet!!" level for me.
I think developer of pi and openclaw are friends, not sure if it matters but also Pi has its own small following. I agree with you it’s such an elegant piece of project with an awesome clean architecture. (Also see: oh my pi)
The real lesson is if you ignore security and data disasters agentic AI is easier than anyone expected.
Yeah it does seem a little fragile. Still battling with working out why it pegs CPU at 100% permanently on a VPS I tried using. Literally just from installing the base
Dependancies introduce unnecessary LOC and features which are, more and more, just written by LLMs themselves. It is easier to just write the necessary functionality directly. Whether that is more maintainable or not is a bit YMMV at this stage, but I would wager it is improving.
What a bizarre comment. Take something like NumPy - has a hard dependency on BLAS implementations where numerical correctness are highly valued for accuracy and require deep thinking for correct implementation as well as for performance. Written in a different language again for performance so again an LLM would have to implement all of those things. What’s the utility in burning energy to regenerate this all the time when implementations already exist?
Interesting thought (I think recently more than ever it's a good idea to question assumptions) - but IMO abstractions are important as ever.
Maybe the smallest/most convenient packages (looking at you is-even) are obsolete, but meaningful packages still abstract a lot of complexity that IMO aren't easier to one-shot with an LLM
Concretely, when you use Django, underneath you have CPython, then C, then assembly, and finally machine code. I believe LLMs have been much better trained on each layer than going end-to-end.
I consider packages over 100k download production-tested. Sure LLM can roll some by themselves but if many edge cases to appear, (which may already be handled by public packages) you will need to handle it.
Don't base anything on just download numbers, not only is it easily game-able, it's enough with like 3 small companies using a package and push commits individually and CI triggering on every new commit for that number to lose any sort of meaning.
Vanity metrics should not be used for engineering decisions.
At times I wonder why x tui coding agent was written in js/ts/python, why not use Go if it's mostly llm coded anyway? But that's mostly my frustration at having to wait for npm to install a thousand dependencies, instead of one executable plus some config files. There's also support libraries like terminal ui that differ in quality between platforms.
>the few Go binaries I've used also installed a bunch of random stuff.
Same goes for rust. Sometime one package implicitly imports other in different version. And look of rustup tree to resolve the issue just doesn't seem very appealing.
Well you do need to vet dependencies and I wish there was a way to exclude purely vibe coded dependencies that no human reviewed but for well established libraries, I do trust well maintained and designed human developed libraries over AI slop.
Don't get me wrong, I'm not a luddite, I use claude code and cursor but the code generated by either of those is nowhere near what I'd call good maintainable code and I end up having to rewrite/refactor a big portion before it's in any halfway decent state.
That said with the most egregious packages like left-pad etc in nodejs world it was always a better idea to build your own instead of depending on that.
I've been copy-pasting small modules directly into my projects. That way I can look them over and see if they're OK and it saves me an install and possible future npm-jacking. There's a whole ton of small things that rarely need any maintenance, and if they do, they're small enough that I can fix myself. Worst case I paste in the new version (I press 'y' on github and paste the link at the top of the file so I can find it again)
LLMs are great at the roll you own crypto foot gun. They will tell you to remember all these things that are important, and then ignore their own tips.
Tokens are expensive and downloading is cheap. I think probably the opposite is true, really, and more packages will be written specifically for LLMs to use because their api uses fewer tokens.
You have insane delusions about how capable LLMs are but even assuming its somehow true: downloading deps instead of hallucinating more code saves you on tokens
If average people try vibecoding their dependencies, they’ll fail, simple as that. We’ve already seen how that looks with the “web browsers” that have recently been vibecoded.
There's a new web browser project today that's a heck of a lot more impressive than the previous ones - ~20,000 lines of dependency-free Rust (though it uses system libraries for image and text rendering), does a good job of the Hacker News homepage: https://news.ycombinator.com/item?id=46779522
Thanks for the heads up, that does look much more interesting.
I don't think it really affects the point discussed above for now, because we were discussing average users, and by definition, the first person to code a plausible web browser with an agent isn't an average user - unless of course that can be reliably replicated with any average user.
But on that note, the takeaways on the post you linked are relevant, because the author bucked a few trends to do this, and concluded among other things that "The human who drives the agent might matter more than how the agents work and are set up, the judge is still out on this one."
This will obviously change, but the areas that LLMs need to improve on here are ones they're notoriously weak on, so it could take a while.
Docker desktop has a pretty nice sandbox feature that will also store your CC (and other) credentials, so you don't have to re-auth every time you create a new container.
Funnily enough, we shipped the Docker Desktop VM a decade ago now (experience report at https://dl.acm.org/doi/10.1145/3747525). The embedded VM in DD is much more stripped down than the one in Claude Cowork (its based on https://github.com/linuxkit/linuxkit), and its more specialised to container workloads rather than just using bubblewrap for sandboxing (system services run in their own isolated namespaces).
Given how many products seem to be using this shipping-Linux-as-a-library-VM trick these days, it's probably a good time for an open source project to step up to supply a more reusable way of assembling this layer into a proper Mac library...
This is one of those announcements that actually just excites me as a consumer. We give our children HomePods as their first device when they turn 8 years old (Apple Watch at 10 years, laptop at 12) and in the 6 years I have been buying them, they have not improved one ounce. My kids would like to listen to podcasts, get information, etc. All stuff that a voice conversation with Chatgpt or Gemini can do today, but Siri isn't just useless-- it's actually quite frustrating!
> Being these things are at their core probability machines, ... How? Why?
Is Siri a probability machine? I didn't think it was an LLM at all right now? I thought it was some horrendous tree of switch statements, hence the difficulty of improving it.
Apple search is comically bad, though. Type in some common feature or app, and it will yield the most obscure header file inside the build deps directory of some Xcode project you forgot existed.
Not exactly the same, but kinda: my gen 1 Google Home just got Gemini and it finally delivers on the promise of like 10 years ago! Brought new life to the thing beyond playing music, setting timers, and occasionally asking really basic questions
It remains to be seen what the existing HomePods will support. There’s been a HomePod hardware update in the pipeline for quite some time, and it appears like they are waiting for the new Siri to be ready.
it's not going to help them. For Siri to be really useful it wouldn't need deep system integration and an external model is not going to provide that. People don't believe me when I said it about Apple Intelligence with open AI
I am currently employing a consultant for something. It's something I don't want to do myself and they are doing what I need, but it's so painfully obvious they are just vanilla ChatGPTing everything it's almost funny at this point.
Any IDE based editor feels like a stopgap to me. We may not be there yet, but I feel that in the future a "vibe coder" isn't even going to look at much code at all. Much of what developers who are relying on Cursor, Windmill, Replit, etc etc are doing is performative as it relates to code. There is just a lot of copy/pasting of console errors and asking for things one way or another.
Casual or "vibe" coding is all about the output. Doesn't work? Roll back. Works well? Keep going. Feeling gutsy? Single shot.
Vibe coding is just a prototyping tool / "dev influencer" gimmick. No one serious is using Cursor for vibe coding, nor will anyone serious ever vibe code. It's for AI assisted development-- in other words, a more powerful intellisense.
I vibed this puzzle game into existence with two breaks* from vibe coding midway through to get it out of a rut: https://love-15.com/
It builds for PC, web, iOS and Android.
It's a simple sliding block puzzle game with a handful of additional game mechanics which you can see if you go into settings to unlock all levels, saved progress and best times/move counts, a level editor, daily puzzles with share results, and theme selection.
I think I found the current limits of vibe coding. There's one bug that I know of which I don't think can be fixed with vibe coding, and so I haven't fixed it as this was largely an experiment to see how far you could get with vibe coding.
I've since inspected the code and I believe the code is just too bad for the LLM to get anywhere at this point. Looking at the git history - I had it commit every time a feature was complete and verified working by me - the code started OK but really went downhill as it got bigger, and it got worse faster over time.
(When I first broke from vibe coding it was hitting a brick wall on progress earlier than expected and I needed to guide it to break the project up into more files, which it is terrible at by the way; I think the one giant file was hitting context length limits, which were smaller at the time than they are now. The second break was at the end to get it over the finish line when it just could not fix some save bugs without introducing new ones, and I did just barely enough technical guidance to help it finish. In neither case did I write code, but I did read code in both cases.)
I felt the same way for a while, but I am really not so sure now. Cursor is definitely drawing on the influencer/growth well to drive some portion of these #s.
It's a lot easier and more scaleable to get 1000 people "vibe coding" than it is to get 10 experienced engineers using you for autocomplete.
Cursor isnt for vibe coding. I use it. I ask the AI to do something I know how to do but it can do it faster. I check the changes to make sure everything looks good.
But this sums up so well why I think the valuation is so riskily high. You're saying that right now IDE UX is so slow and bad that often there are changes you know how to make but it would literally just be too many keystrokes for you to want to do yourself.
As far as I can tell if people like you just had a way to express code ideas with fewer keystrokes, a lot of Cursor's market would pretty much just dry up.
I am currently dealing with a relatively complex legal agreement. It's about 30 pages. I have a lawyer working on it who I consider the best in the country for this domain.
I was able to pre-process the agreement, clearly understand most of the major issues, and come up with a proposed set of redlines all relatively easily. I then waited for his redlines and then responded asking questions about a handful of things he had missed.
I value a lawyer being willing to take responsibility for their edits, and he also has a lot of domain specific transactional knowledge that no LLM will have, but I easily saved 10 hours of time so far on this document.
I am a semi-retired blue collar electrician. Higher IQ, but lowly certifications (definitely not a lawyer).
Currently I have initiated a lawsuit in my US state's small claims civil court, over a relatively simple payment dispute. Without the ability to bounce legal questions/tact/procedurals off of Perplexity, I wouldn't have felt comfortable enough to represent myself in court.
Even if I were to need a lawyer on this simple case, the majority of the "leg work" has already been completed by free, non-pay LLMs.
My court date is early June; I'm both nervous and excited (for restitution)!
----
I have a judge brother and have been arguing for years that law clerking is probably in its last gasps of career-entry; Chief Justice Roberts's end of 2023 SCOTUS report was a refreshing read to share among family members (which argued that LLMs will provide more accessibility to judiciary by commoners).
Personally, I already would rather have a jury of LLMs deciding most legal outcomes (albeit would need to be impartially programmed, if that's even possible). Definitely would make for better democratic accessibility.
I found Bruce Schneier's recent article "Reimagining Democracy" [1] quite an interesting thought experiment (which is about his hosting intellectuals in their discussions of creating entirely new democracies utilizing modern technologies). It'd be super fair if a trusted AI government could lead to better democracies than "modern capitalism" can / has.
This is super interesting, because I've been in similar conflicts (as a renter trying to recover security deposits in court) and been screwed over by the lawyers I've retained (like, literally not even showing up in court) and I probably could have done all this with an LLM myself. When the stakes are low, why not?
I've also had a lawyer not show up (me as criminal defendant), and then try to fleece me for more money (than initially agreed) because we (I) had to reschedule my court date — only to eventually reach a simple plea agreement which any public defender could have secured. LLMs didn't exist when this occurred, well over a decade ago.
>similar conflicts (as a renter trying to recover security deposits in court)
This is basically my current scenario. LL sold the rental I was living in, which I had pre-paid for an entire year, because the septic tank went out. We mutually agreed to end our lease... he then wrote me a check for overpayment... he then canceled the check (without even telling me). As an added bonus, he tried nothing to fix the tank... then sold the disaster to somebody else (I found out only when the new owner showed up on my/his doorstep).
Not my first time in court, but is my first time as Plaintiff. I'm very excited to (potentially) get awarded TREBLE DAMAGES on my few-thousand-dollar initial claim/dispute.
The era of "A lawyer that represents himself has a fool for his client" are rapidly approaching end, particularly within small claims civil courts. I'd love to see entire branches of government replaced with machine-learnt judges.
I've already decided that if the Defendant (in my case) chooses to appeal to our higher court (i.e. not small claims, which he is entitled to do) I will retain an attorney, only because civil procedure is so nuanced.
But I'm trying first, and most of the legwork is already formulated.
I think it's the small TPM limits. I'll be way under the 10-30 requests per minute while using Cline, but it appears that the input tokens count towards the rate limit so I'll find myself limited to one message a minute if I let the conversation go on for too long, ironically due to Gemini's long context window. AFAIK Cline doesn't currently offer an option to limit the context explosion to lower than model capacity.
To each their own.
reply