I like the idea, but how do you handle hallucinations? E.g. when the user asks about their AWS bill, how can they be sure the numbers they get are accurate?
Would be good if any user interface that uses these kinds of llm solutions always include the raw data from the backing services when asked for. Like how in chatgpt you can open and inspect its interaction with a plugin.
Release engineer here. You can do a bunch of cool tricks like go into `/expert` mode and view all the details. Join our slack workspace (https://release-ai.slack.com) and we have tonnes more tricks and techniques to share!
Release engineer here. That's an excellent question, and we worry about it all the time. The AI "seems" authoritative, but it can't even add 1+1 sometimes :crying-emoji:. We've tried to engineer the prompts and tooling so that it will say "I don't know" if it doesn't know. But we've still seen it say some crazy things, like "Your cluster is fine" when it clearly wasn't. :tounge-sticking-out-emoji: I guess the only real answer is you have to trust but verify.
If you want to use emoji text like that when communicating about positive things, that’s one thing. Some people will find it grating, but that’s up to you. But if you use it to talk about negative things like bugs, that will piss people off. There’s a time and a place for emojis, and communicating bad news is definitely not one of them. It gives the very strong impression you don’t take bugs seriously.
You need to engineer a system when the AI state something it has to give a command that should support what it says and explain how the command shows that it is true. At this point the command should be really executed and its output or error fed yo the AI so that it can confirm the statements or correct it.
I am crazy how they think a system with no feedback loop can be always accurate. Only perfect mathematics can work like this, any -like system need to have a feedback loop.
Excellent idea, we do internally feed the answers back to the system to improve its own inputs and outputs. The funniest part of some of this experience has been to find cases where even humans were hallucinating: "Hey, I thought this was shutdown?!" or "I can't find the bucket!" Even on a bad day, the humans are still ahead though.
Thanks for the answer. Yeah, that's pretty much what I expected would be the case. Speaking as another dev in the AI space, it seems like reliability and consistency are the hardest issues when it comes to making AI genuinely useful in production vs. just a neat toy, and there's no stock solution.
Tommy, CEO here. We also have some ideas on reporting hallucinations and feeding wrong answers back into the prompts automatically to help reduce instances of hallucinations. We have a few other ideas and would welcome any ideas folks have to help with this problem.