This really resonates with my experience. Working at a major airline, I was the one who would pick the most difficult and risky projects. One was a quick implementation of a new payment provider for their website. That website sold millions of euros worth of tickets every day. Seconds after deployment, it turned out that I had failed to recognize the differences between the test and live environments as one of the crucial variables was blank in production. I could have expected this if I had spent more time preparing and reading documentation. Sales died completely, and my heart sank. After a lengthy rollback procedure that resulted in a few hours without sales, a massive surge of angry customers, and a loss of several million euros, I approached the CEO of the company. I still remember catching him in an elevator. I explained that this incident was all my fault and I had failed to properly analyse the environment. I assured him that I was ready to bear full consequences, including being fired. He burst into laughter and said something like this: "Why would I want to get rid of you? You made a mistake that you'll never do again. You are better at your job than you were yesterday!" This experience was formative to me on many levels including true leadership. I successfully completed many high risk projects since than.
The language is just so anodyne and there’s just that bit of implausible detail in the story (approaching the CEO yourself when you’re the one who fucked up, also how parent claims to be a “top performer” and “I made my company lose millions” at the same time) makes me think this comment was written by an LLM, or at least a fabrication.
The suspicious part for me would be the CEO laughing like it was nothing. Also yes, one would expect it goes the other way around, when you messed up big, someone will come to you. But the world is big and maybe it happened like this.
"> A young executive had made some bad decisions that cost the company several million dollars. He was summoned to Watson’s office, fully expecting to be dismissed. As he entered the office, the young executive said, “I suppose after that set of mistakes you will want to fire me.” Watson was said to have replied,
> “Not at all, young man, we have just spent a couple of million dollars educating you.” [1]"
A variant is in From the Earth to the Moon, where a junior engineer at Grumman confesses messing up vital calculations for the Lunar Lander to his boss, and finishes with "So… I guess I'll go clean out my desk." "What for?" "I figure you're gonna fire me now."
The boss's response makes a lot more sense than the usual fluff, though: "If I fire you now, the next guy to make a mistake won't admit it and we won't find out about it until it's too late."
I wonder how much of those stories are rather wishful thinking of how it should work and not how it does work, when a major screw up happened and some heads need to roll for the sake of it.
I wonder how would the boss explain it to his bosses/shareholders. Was that totally a known possible outcome that merely surfaced by chance and subsequently handled without issues under his leadership, or...?
Apollo was very much pushing the envelope of bleeding edge technology, while the bosses were probably not too happy, it was far from the only occurrence, and didn't threaten the contract.
Story is real or not aside, why would you not laugh it off? At that stage nothing can be changed, money was lost and bug was fixed. You can only look forward and plan for the future and the guy is going to be paranoid in the future deployments to make sure not to fuck up again.
In general yes, and people with enough Zen can do this. But if also the CEO is looking forward to hear from the board and the investors to explain the incident, he might not be in the mood to laugh.
> That website sold millions of euros worth of tickets every day.
The claim wasn't that a single airline sold a million dollars per day, but that a third party on seller sold a million euros worth of tickets a day.
Is that plausible?
Consider The City in the Sky:
Every day 100,000 flights criss-cross the globe with more than 1 million people in the air at any one time. Dallas Campbell and Dr Hannah Fry explore the world of aviation.
I'm a frequent flyer and I got a feeling that most airline ticket booking pages are broken in some way more than half the time. Maybe not often broken to the point that they're blank, but definitely broken to the point that booking a ticket isn't possible (I prefer blank, so that I don't waste like 30 minutes on not being able to book a ticket).
Also most of the internet seems often broken. Oh hello Nike webshop errors upon payment (on Black Friday) for which helpdesk's solution is: just use the App.
What always got me is that, for at least the first several years, Google couldn't get their store page to handle the load when they were releasing a new phone and it'd be crapping out for days.
Steam still craps out during large sales. I really wonder how Valve calculates that it's fine to keep losing out on (cumulatively) hours of sales each time.
Woa, and I always wonder why it's only me that seems to have to use the developer tools to enable that stupid submit button when I filled out every field on the page correctly, shaking my head and wondering how normal people use the internet. I keep thinking it's got to be something about using firefox instead of a big tech browser, my mouse gestures extension, I don't know but normal webshops are broken so often it's insane. Thanks for sharing that it's not just me!
I think the loss may not have been as much as you think; sure, nobody could buy tickets for a few hours, so theoretically the company lost millions of revenue during that time. But that assumes people wouldn't just try again later. Downtime does not, in practice, translate to losses I think.
I mean look at Twitter, which was famously down all the time back when it first launched due to it popularity and architecture. Did it mean people just stopped using Twitter? Some might, the vast majority and then some didn't.
Downtime isn't catastrophic or company-ending for online services. It may be for things in space or high-frequency trading software bankrupting the company, but that's why they have stricter checks and balances - in theory, in practice they're worse than most people's shitty CRUD webservices that were built with best practices learned from the space/HFT industries.
Even with HFT you’d have to have more than 50% of your trades go against you to lose any money, and you’ll probably have hedges, and losing some % of money will be within normal operation parameters. Shit happens! Links go down, hardware fails, bugs slip through no matter how diligent you are. (No I’m not looking to be hired by any HFT shops)
Being an airline boss I would really have hoped the response would have been more in line with the ethos of a plane crash postmortem, i.e. find the system causes and fix those. Maybe you need a copilot when doing live deployments and that copilot had authority to stop the rollout. Along with the usual devops
guards.
This reminds me of that old joke that ends "Why would I fire you? We just spent millions training you!".
People who take on high risk projects are underappreciated. But many managers prefer employees who can reliably deliver zero value, than those with positive expected value but non-zero variance.
There must be more people like you in those major airlines, as those sites go down all the damn time. 6 hours..??? The lufthansa desktop site didn't allow anyone book anything for like 3 weeks straight, you had to use the app instead.
Your company should definitely have had a production-identical staging environment if an hour of downtime means millions lost :D
That development would be an obvious investment that pays for itself. I’m in banking, and terrified of making even a slightly complex deployment without validating it in production first. (Complex here referring to that it might be dependent not just on code changes, but also environment).