I like the idea, but how do you handle hallucinations? E.g. when the user asks a...

worldsayshi · on Aug 25, 2023

Would be good if any user interface that uses these kinds of llm solutions always include the raw data from the backing services when asked for. Like how in chatgpt you can open and inspect its interaction with a plugin.

regiswilson · on Aug 25, 2023

Release engineer here. You can do a bunch of cool tricks like go into `/expert` mode and view all the details. Join our slack workspace (https://release-ai.slack.com) and we have tonnes more tricks and techniques to share!

regiswilson · on Aug 25, 2023

Release engineer here. That's an excellent question, and we worry about it all the time. The AI "seems" authoritative, but it can't even add 1+1 sometimes :crying-emoji:. We've tried to engineer the prompts and tooling so that it will say "I don't know" if it doesn't know. But we've still seen it say some crazy things, like "Your cluster is fine" when it clearly wasn't. :tounge-sticking-out-emoji: I guess the only real answer is you have to trust but verify.

JimDabell · on Aug 25, 2023

> But we've still seen it say some crazy things, like "Your cluster is fine" when it clearly wasn't. :tounge-sticking-out-emoji:

It’s difficult to take you seriously when you write like this about show-stopping bugs.

regiswilson · on Aug 25, 2023

I was referring to problems we found during initial development, but I appreciate that I didn't clarify that well.

JimDabell · on Aug 26, 2023

I’m specifically referring to things like this:

> :tounge-sticking-out-emoji:

If you want to use emoji text like that when communicating about positive things, that’s one thing. Some people will find it grating, but that’s up to you. But if you use it to talk about negative things like bugs, that will piss people off. There’s a time and a place for emojis, and communicating bad news is definitely not one of them. It gives the very strong impression you don’t take bugs seriously.

regiswilson · on Aug 27, 2023

I apologise for the poor choice on my part. I'll take everything you said to heart and learn from this experience.

frankohn · on Aug 25, 2023

You need to engineer a system when the AI state something it has to give a command that should support what it says and explain how the command shows that it is true. At this point the command should be really executed and its output or error fed yo the AI so that it can confirm the statements or correct it.

I am crazy how they think a system with no feedback loop can be always accurate. Only perfect mathematics can work like this, any -like system need to have a feedback loop.

regiswilson · on Aug 25, 2023

Excellent idea, we do internally feed the answers back to the system to improve its own inputs and outputs. The funniest part of some of this experience has been to find cases where even humans were hallucinating: "Hey, I thought this was shutdown?!" or "I can't find the bucket!" Even on a bad day, the humans are still ahead though.

Michelangelo11 · on Aug 25, 2023

Thanks for the answer. Yeah, that's pretty much what I expected would be the case. Speaking as another dev in the AI space, it seems like reliability and consistency are the hardest issues when it comes to making AI genuinely useful in production vs. just a neat toy, and there's no stock solution.

tommy_mcclung · on Aug 25, 2023

Tommy, CEO here. We also have some ideas on reporting hallucinations and feeding wrong answers back into the prompts automatically to help reduce instances of hallucinations. We have a few other ideas and would welcome any ideas folks have to help with this problem.

Michelangelo11 · on Aug 25, 2023

After thinking about it for a bit, I have an idea that might help. The writeup is probably too long for an HN comment, though. Could I email you?

tommy_mcclung · on Aug 25, 2023

Of course. tommy@release.com

say_it_as_it_is · on Aug 25, 2023

How about applying good old fashioned bean counting?