> You need to think in terms of a probability of a successful hallucination or prompt injection.
I would venture to say that an ACID compliant deterministic database has a 99.999999999999999999% chance of retrieving the correct information when asked by the correct SQL statement. An LLM on the other hand is more like 90%. LLMs by their innate code instruction are meant to hallucinate. I don't necessarily disagree with your sentiment, but the gap from 90% to 99.999999999999999999% is much greater of than the 0% to 90% improvement...unless something materially changes about how an LLM works at the bytecode level.
> The whole point of OpenClaw is to run AI actions with your own private data, your own Gmail, your own WhatsApp, etc. There's no point in using OpenClaw with that much restriction on it.
Hard disagree. I have OpenClaw running with its own gmail and WhatsApp running on its own Ubuntu VM. I just used it to help coordinate a group travel trip. It posted a daily itinerary for everyone in our WhatsApp group and handled all of the "busy work" I hate doing as the person who books the "friend group" trip. Things like "what time are doing lunch at the beach club today?" to "whats the gate code to get into the airbnb again?"
My next step is to have it act on my behalf "message these three restaurants via WhatsApp and see which one has a table for 12 people at 8pm tonight". I'm not comfortable yet to have it do that for me but I'm getting there.
Point is, I get to spend more valuable time actually hanging out and being present with my friends. That's worth every dollar it costs me ($15/month Tmobile SIM card).
I believe you only need a unique phone number to create the account, then you can use WhatsApp Web as client. Be very careful with alternative clients, as I've had an account banned in the past for this (and therefore a phone number blacklisted), even without messaging anybody. I think that clients that run WhatsApp Web in a web view (like https://github.com/rafatosta/zapzap) are safe.
I think they started banning unauthorized API users around the time that "WhatsApp For Business" was introduced, because it was competing with that product. Unfortunately WhatsApp For Business is geared toward physical products and services with registered companies, so home automation and agents are left with no options.
I believe you can use a virtual number/VOIP (like Twilio or Google Voice), but I want to be able to eventually use SMS where WhatsApp can't be used, so I do know some services identify "non residential" SMS phone numbers (for example I've seen Google Voice numbers blocked) so I wanted to prevent that from happen. Again, key thing here for me is that my assistant appears to be a human.
The number of times i realized half way that I probably posted the wrong password and so I vigorously type the 'delete' key to reset the input is too damn high
The Just in that sentence is wholly unjustified. There are plenty of cli/tui/console/shell shortcuts that are incredibly useful, yet they are wholly undiscoverable and do not work cross-platform, e.g. shell motions between macOS and reasonable OSes.
All the movement commands I know work the same in the terminal on a default install of macOS as it does in the terminal on various Linux distros I use.
Ctrl+A to go to beginning of line
Ctrl+E to go to end of line
Esc, B to jump cursor one word backwards
Esc, F to jump cursor one word forward
Ctrl+W to delete backwards until beginning of word
And so on
Both in current versions of macOS where zsh is the default shell, and in older versions of macOS where bash was the default shell.
Am I misunderstanding what you are referring to by shell motions?
Yea, but ctrl + arrows to move cursor between ‘words’ don’t work, especially sad when SSH’ing in from linux. It works fine when using terminal on macOS - you just use command + arrows.
It's built into the Unix terminal driver. Control-U is the default, but it can be changed with e.g. "stty kill". Libraries like readline also support it.
I have had a similar issue where I thought my computer went to sleep so I start typing my password while the monitor wakes up only to realize that it was only the screen that turned off and the computer was already unlocked so when I hit enter the password was sent into a slack thread or dm instead
But yeh, never thought this was a problem anyone else delt with. My passwords are all a variant of my on "master password" and sometimes forget which session I'm in so trying to save keystrokes, count backward to where I think the cursor should be.
LLMs by their nature are not goal orientated (this is fundamental difference of reinforcement learning vs neural networks for example). So a human will have the, let's say, the ultimate goal of creating value with a web application they create ("save me time!"). The LLM has no concept of that. It's trying to complete a spec as best it can with no knowledge of the goal. Even if you tell it the goal it has no concept of the process to achieve or confirm the goal was attained - you have to tell it that.
I don't know, I would assume it works but I would not expect it to be free of bugs. But that is the baseline for code, being correct - up to some bugs - is the absolute minimum requirement, code quality starts from there - is it efficient, is it secure, is it understandable, is it maintainable, ...
So do you expect it not to be free of bugs because you've run a comprehensive test on it, read all of the code yourself or are you just concluding that because you know it was generated by an LLM?
It has not been formally verified which is essentially the only way to achieve code without defects with reasonable confidence. There are several studies that have found that there are roughly between one and twenty bugs per thousand lines of code in any software, this project has several thousand lines of code, so I would expect several bugs if written by humans and I have no reason to assume that large language models outperform humans in this respect, not at last because they are trained on code written by humans and have been trained to generate code as written by humans.
But you said "it's not great code" and then said "i don't know", so your idea of it being "not great code" is purely speculative and totally unfounded.
No, my judgment of not great code is not based on what the code does - and if it does so correctly - but on how the code is written. Those are independent things, you can have horrible code that does what it is supposed to do but you can also have great code that just does the wrong thing [1].
[1] I would however argue the later thing is more rare as it requires competent developers, however this still does not preclude some misunderstanding of the requirements.
It works really well, multiple people have been using it for a month or so (including me) and it's flawless. I think "not great" means "not very readable by humans", but it wasn't really meant to be readable.
I don't know if there are underlying bugs, but I haven't hit any, and the architecture (which I do know about) is sane.
> Its shocking some people don't give it any real instruction or way to check itself.
It's not shocking. The tech world is telling them that "Claude will write all of their app easily" with zero instructions/guidelines so of course they're going to send prompts like that.
I think the implications of limited to no instructions are a little to way off depending on what you're doing... CRUD APIs, sure... especially if you have a well defined DB schema and API surface/approach. Anything that might get complex, less so.
Two areas I've really appreciated LLMs so far... one is being able to make web components that do one thing well in encapsulation.. I can bring it into my project and just use it... AI can scaffold a test/demo app that exercises the component with ease and testing becomes pretty straight forward.
The other for me has been in bridging rust to wasm and even FFI interfaces so I can use underlying systems from Deno/Bun/Node with relative ease... it's been pretty nice all around to say the least.
That said, this all takes work... lots of design work up front for how things should function... weather it's a ui component or an API backend library. From there, you have to add in testing, and some iteration to discover and ensure there aren't behavioral bugs in place. Actually reviewing code and especially the written test logic. LLMs tend to over-test in ways that are excessive or redundant a lot of the time. Especially when a longer test function effectively also tests underlying functionalities that each had their own tests... cut them out.
There's nothing "free" and it's not all that "easy" either, assuming you actually care about the final product. It's definitely work, but it's more about the outcome and creation than the grunt work. As a developer, you'll be expected to think a lot more, plan and oversee what's getting done as opposed to being able to just bang out your own simple boilerplate for weeks at a time.
> this all takes work... lots of design work up front for how things should function... weather it's a ui component or an API backend library. From there, you have to add in testing, and some iteration to discover and ensure there aren't behavioral bugs in place.
thats the reality, but the marketing to the top-level people (and mass media) is like the other poster stated, and that filters to devs as well causing this big gap in expectations going in
It's surprising they don't learn better after their first hour or two of use. Or maybe they do know better but don't like the thing so they deliberately give it rope to hang itself with, then blame overzealous marketting.
If you work in a PE shop you’d know it’s not riskfree and that the PE firm also puts their own money up plus money raised through LPs (hence “leveraged”)
> PE firm also puts their own money up plus money raised through LPs (hence “leveraged”)
This is not true. PE firm individuals put their own money in a fund, which also has LPs money. That fund is used to acquire businesses and in order to fund a transaction they use both equity (capital from that fund) and debt (loans from banks) to fund the transaction. The debt is the "leverage" part of the equation...hence leveraged.
Hey, here is Rafa, another Rudel AI developer. The ultimate goal is to make developers more productive. Suddenly, we had everyone having dozens of sessions per day, producing 10X more code, we were having 10X more activity but not necessarily 10X productivity.
With this data, you can measure if you are spending too many tokens on sessions, how successful sessions are, and what makes them successful. Developers can also share individual sessions where they struggle with their peers and share learnings and avoid errors that others have had.
That's because these layoffs aren't about AI. They're about firms that overhired and Wall St is (finally) having a sobering moment of their (profit) growth potential.
On a more serious note, at this rate, probably 10 years. I guess it's similar to drinking. You can get drunk in 10 minutes and be hung over for much longer than that.
It turns out that org charts are VERY resistant to letting people go, even when the executives push very hard.
I would venture to say that an ACID compliant deterministic database has a 99.999999999999999999% chance of retrieving the correct information when asked by the correct SQL statement. An LLM on the other hand is more like 90%. LLMs by their innate code instruction are meant to hallucinate. I don't necessarily disagree with your sentiment, but the gap from 90% to 99.999999999999999999% is much greater of than the 0% to 90% improvement...unless something materially changes about how an LLM works at the bytecode level.
reply