The irony of githubstatus.com itself being hosted on a third-party (Atlassian Statuspage) is not lost on anyone who works in incident management. Your status page being up while your product is down is table stakes, not a feature.
What's more interesting to me is the pattern: second major outage in the same day, and the status page showed "All Systems Operational" for a good chunk of the first one. The gap between when users notice something is broken and when the status page reflects it keeps growing. That's a monitoring and alerting problem, not just an infrastructure one.
At some point the conversation needs to shift from "GitHub is down again" to "why are so many engineering orgs single-threaded on a platform they don't control and can't observe independently?" Git is distributed by design. Our dependency on a centralized UI layer around it is a choice we keep making.
> The irony of githubstatus.com itself being hosted on a third-party (Atlassian Statuspage) is not lost on anyone who works in incident management. Your status page being up while your product is down is table stakes, not a feature
That's WHY it's hosted externally, so that if GitHub goes down the status page doesn't.
What's more interesting to me is the pattern: second major outage in the same day, and the status page showed "All Systems Operational" for a good chunk of the first one. The gap between when users notice something is broken and when the status page reflects it keeps growing. That's a monitoring and alerting problem, not just an infrastructure one.
At some point the conversation needs to shift from "GitHub is down again" to "why are so many engineering orgs single-threaded on a platform they don't control and can't observe independently?" Git is distributed by design. Our dependency on a centralized UI layer around it is a choice we keep making.