Took the time to read the 84 page report that was published last month, and write up a summary. Pretty interesting bug, with a pretty significant impact.
It sounds like marketing, I know, but this was written by engineers and practitioners, and is a pretty comprehensive overview how modern tech orgs are dealing with incidents these days.
> Humans aren't great at incident response, and we all hate waking up at 2am to resolve an issue.
Agree that most people hate being woken at 2am, but disagree that humans aren't great at incident response. Speaking generally, I think we're about as good as it gets when it comes to adaptability and the kind reasoning that's necessary to investigate complex issues.
That said, I also think AI can play a massive role aiding humans, especially in undifferentiated tasks like checking deployments, code changes, past incidents, and when it comes to spotting patterns.
IMO the sweet spot is going to come from highly ergonomic AI products that enable collaborative incident response, rather AI incident management or any other marketing BS.
The thing is, nobody needs to be woken up at 2am at Meta. They're big enough to have a follow the sun model where they rotate support around the world. They have a footprint almost everywhere.
As the person who originally wrote the Monzo Response project (https://github.com/monzo/response), I expect you'll find some traction in smaller orgs, but when folks start doing things at scale they'll hit an inflection point where running their own incident software/not having folks to log feature requests with will force them to pick something more off-the-shelf.
Basically, nobody runs their on-call system on open-source because it's mission critical. At a certain point, IM platforms hit the same level of criticality.
'Fun' is a little strong, but I really wish more companies adopted this kind of workflow for TF. Being forced into a world of full-TF or fully-manual sucks, and the lack of signposting when you're in a blended world of manual and managed resources really doesn't help.
Tweet author here, and founder of a startup. I know this is isn't revolutionary, but I've iterated towards this state and now find it a critical part of my daily process.
reply