Hacker Newsnew | past | comments | ask | show | jobs | submit | mbesto's commentslogin

> You need to think in terms of a probability of a successful hallucination or prompt injection.

I would venture to say that an ACID compliant deterministic database has a 99.999999999999999999% chance of retrieving the correct information when asked by the correct SQL statement. An LLM on the other hand is more like 90%. LLMs by their innate code instruction are meant to hallucinate. I don't necessarily disagree with your sentiment, but the gap from 90% to 99.999999999999999999% is much greater of than the 0% to 90% improvement...unless something materially changes about how an LLM works at the bytecode level.


Eh if you hire a programmer to program things for you, you won’t get a 99.9999999999%.

Getting LLMs to have a reliability rate that is on par or superior to human performance is very very achievable.


> The whole point of OpenClaw is to run AI actions with your own private data, your own Gmail, your own WhatsApp, etc. There's no point in using OpenClaw with that much restriction on it.

Hard disagree. I have OpenClaw running with its own gmail and WhatsApp running on its own Ubuntu VM. I just used it to help coordinate a group travel trip. It posted a daily itinerary for everyone in our WhatsApp group and handled all of the "busy work" I hate doing as the person who books the "friend group" trip. Things like "what time are doing lunch at the beach club today?" to "whats the gate code to get into the airbnb again?"

My next step is to have it act on my behalf "message these three restaurants via WhatsApp and see which one has a table for 12 people at 8pm tonight". I'm not comfortable yet to have it do that for me but I'm getting there.

Point is, I get to spend more valuable time actually hanging out and being present with my friends. That's worth every dollar it costs me ($15/month Tmobile SIM card).


> handled all of the "busy work" I hate doing as the person who books the "friend group" trip

Why do you go on trips with your friends if you have to do all the work?


Do you need the simcard for WhatsApp?

I believe you only need a unique phone number to create the account, then you can use WhatsApp Web as client. Be very careful with alternative clients, as I've had an account banned in the past for this (and therefore a phone number blacklisted), even without messaging anybody. I think that clients that run WhatsApp Web in a web view (like https://github.com/rafatosta/zapzap) are safe.

I think they started banning unauthorized API users around the time that "WhatsApp For Business" was introduced, because it was competing with that product. Unfortunately WhatsApp For Business is geared toward physical products and services with registered companies, so home automation and agents are left with no options.


I believe you can use a virtual number/VOIP (like Twilio or Google Voice), but I want to be able to eventually use SMS where WhatsApp can't be used, so I do know some services identify "non residential" SMS phone numbers (for example I've seen Google Voice numbers blocked) so I wanted to prevent that from happen. Again, key thing here for me is that my assistant appears to be a human.

The number of times i realized half way that I probably posted the wrong password and so I vigorously type the 'delete' key to reset the input is too damn high

Just type Control-U once.

The Just in that sentence is wholly unjustified. There are plenty of cli/tui/console/shell shortcuts that are incredibly useful, yet they are wholly undiscoverable and do not work cross-platform, e.g. shell motions between macOS and reasonable OSes.

> shell motions between macOS and reasonable OSes

All the movement commands I know work the same in the terminal on a default install of macOS as it does in the terminal on various Linux distros I use.

Ctrl+A to go to beginning of line

Ctrl+E to go to end of line

Esc, B to jump cursor one word backwards

Esc, F to jump cursor one word forward

Ctrl+W to delete backwards until beginning of word

And so on

Both in current versions of macOS where zsh is the default shell, and in older versions of macOS where bash was the default shell.

Am I misunderstanding what you are referring to by shell motions?


Yea, but ctrl + arrows to move cursor between ‘words’ don’t work, especially sad when SSH’ing in from linux. It works fine when using terminal on macOS - you just use command + arrows.

Works fine for me. Configure your shell.

These are emacs bindings of yore. On macOS and some Linux DEs they also work in UI text fields :)

What happens when you press home or end?

In iTerm at least it goes to the beginning or end of current line.

The number of times I’ve attempted to use Ctrl-U in a Python shell only to discover it doesn’t work…

Haven't seen this - shouldn't this always work on unixy platforms? If using readline/editline it works, and if built without it also works.

It’s an internal, custom, vaguely UNIX-like shell in Windows. Typically I’m running Python from bash; Ctrl-U works under bash, but not Python.

> e.g. shell motions between macOS and reasonable OSes.

I forgot about this since I started NixOS/home-manager everywhere.


That's great. I've been using terminals for 20+ years, and never new about CTRL-U. Thanks! TIL.

It's built into the Unix terminal driver. Control-U is the default, but it can be changed with e.g. "stty kill". Libraries like readline also support it.

I only know this because of xkcd

The number of times I've posted my sudo password in a random slack channel instead of my terminal is not very high, but too damn high nonetheless

I have had a similar issue where I thought my computer went to sleep so I start typing my password while the monitor wakes up only to realize that it was only the screen that turned off and the computer was already unlocked so when I hit enter the password was sent into a slack thread or dm instead

Start your password with a forward slash :)

The trick is to use a plausible Slack message as your sudo password :)

“I quit!” Even includes a special character

"I, uhh, need your thoughts." should have fewer consequences AND be more secure.

Did you just type in your password on HN?

Get out of my head, lol :)

But yeh, never thought this was a problem anyone else delt with. My passwords are all a variant of my on "master password" and sometimes forget which session I'm in so trying to save keystrokes, count backward to where I think the cursor should be.


LLMs by their nature are not goal orientated (this is fundamental difference of reinforcement learning vs neural networks for example). So a human will have the, let's say, the ultimate goal of creating value with a web application they create ("save me time!"). The LLM has no concept of that. It's trying to complete a spec as best it can with no knowledge of the goal. Even if you tell it the goal it has no concept of the process to achieve or confirm the goal was attained - you have to tell it that.

> and that is not great code

When you say "is not great code" can you elaborate? Does the code work or not?


I don't know, I would assume it works but I would not expect it to be free of bugs. But that is the baseline for code, being correct - up to some bugs - is the absolute minimum requirement, code quality starts from there - is it efficient, is it secure, is it understandable, is it maintainable, ...

So do you expect it not to be free of bugs because you've run a comprehensive test on it, read all of the code yourself or are you just concluding that because you know it was generated by an LLM?

It has not been formally verified which is essentially the only way to achieve code without defects with reasonable confidence. There are several studies that have found that there are roughly between one and twenty bugs per thousand lines of code in any software, this project has several thousand lines of code, so I would expect several bugs if written by humans and I have no reason to assume that large language models outperform humans in this respect, not at last because they are trained on code written by humans and have been trained to generate code as written by humans.

But you said "it's not great code" and then said "i don't know", so your idea of it being "not great code" is purely speculative and totally unfounded.

No, my judgment of not great code is not based on what the code does - and if it does so correctly - but on how the code is written. Those are independent things, you can have horrible code that does what it is supposed to do but you can also have great code that just does the wrong thing [1].

[1] I would however argue the later thing is more rare as it requires competent developers, however this still does not preclude some misunderstanding of the requirements.


It works really well, multiple people have been using it for a month or so (including me) and it's flawless. I think "not great" means "not very readable by humans", but it wasn't really meant to be readable.

I don't know if there are underlying bugs, but I haven't hit any, and the architecture (which I do know about) is sane.


> Its shocking some people don't give it any real instruction or way to check itself.

It's not shocking. The tech world is telling them that "Claude will write all of their app easily" with zero instructions/guidelines so of course they're going to send prompts like that.


I think the implications of limited to no instructions are a little to way off depending on what you're doing... CRUD APIs, sure... especially if you have a well defined DB schema and API surface/approach. Anything that might get complex, less so.

Two areas I've really appreciated LLMs so far... one is being able to make web components that do one thing well in encapsulation.. I can bring it into my project and just use it... AI can scaffold a test/demo app that exercises the component with ease and testing becomes pretty straight forward.

The other for me has been in bridging rust to wasm and even FFI interfaces so I can use underlying systems from Deno/Bun/Node with relative ease... it's been pretty nice all around to say the least.

That said, this all takes work... lots of design work up front for how things should function... weather it's a ui component or an API backend library. From there, you have to add in testing, and some iteration to discover and ensure there aren't behavioral bugs in place. Actually reviewing code and especially the written test logic. LLMs tend to over-test in ways that are excessive or redundant a lot of the time. Especially when a longer test function effectively also tests underlying functionalities that each had their own tests... cut them out.

There's nothing "free" and it's not all that "easy" either, assuming you actually care about the final product. It's definitely work, but it's more about the outcome and creation than the grunt work. As a developer, you'll be expected to think a lot more, plan and oversee what's getting done as opposed to being able to just bang out your own simple boilerplate for weeks at a time.


  > this all takes work... lots of design work up front for how things should function... weather it's a ui component or an API backend library. From there, you have to add in testing, and some iteration to discover and ensure there aren't behavioral bugs in place.
thats the reality, but the marketing to the top-level people (and mass media) is like the other poster stated, and that filters to devs as well causing this big gap in expectations going in

It's surprising they don't learn better after their first hour or two of use. Or maybe they do know better but don't like the thing so they deliberately give it rope to hang itself with, then blame overzealous marketting.

> And we have seen example after example of these LBO's ruining otherwise functioning businesses. It's happening. All over the place.

Your anecdotes and the anecdotes in media are no statistical evidence for "this is happening all over the place".

Yes, PEs/LBOs deserves criticism, but "PE" and "LBO" isn't a one size fits all situation.


Guy who works in the PE market here (not a PE shop myself) - this comment is correct.

Correct. One niggle in that PE can access private credit as part of the capital stack. One flavor of debt in the ice cream store.

If you work in a PE shop you’d know it’s not riskfree and that the PE firm also puts their own money up plus money raised through LPs (hence “leveraged”)

Not sure your point, but...

> PE firm also puts their own money up plus money raised through LPs (hence “leveraged”)

This is not true. PE firm individuals put their own money in a fund, which also has LPs money. That fund is used to acquire businesses and in order to fund a transaction they use both equity (capital from that fund) and debt (loans from banks) to fund the transaction. The debt is the "leverage" part of the equation...hence leveraged.


My point is that it’s certainly not risk-free which was the claim of the comment I replied to.

So what conclusions have you drawn or could a person reasonably draw with this data?

Hey, here is Rafa, another Rudel AI developer. The ultimate goal is to make developers more productive. Suddenly, we had everyone having dozens of sessions per day, producing 10X more code, we were having 10X more activity but not necessarily 10X productivity.

With this data, you can measure if you are spending too many tokens on sessions, how successful sessions are, and what makes them successful. Developers can also share individual sessions where they struggle with their peers and share learnings and avoid errors that others have had.


yes what rafa said... aaand we see who wastes the 200 bucks claude subscription by not using it

> Layoffs because of AI make no sense to me.

That's because these layoffs aren't about AI. They're about firms that overhired and Wall St is (finally) having a sobering moment of their (profit) growth potential.


This line has been repeated ad nauseum for years at this point.

How long does a "sobering moment" last? Two years? Five? Ten?


The beating will continue until morale improves.

On a more serious note, at this rate, probably 10 years. I guess it's similar to drinking. You can get drunk in 10 minutes and be hung over for much longer than that.

It turns out that org charts are VERY resistant to letting people go, even when the executives push very hard.


A decision to cut 1600 jobs doesn't happen overnight.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: