Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The people who designed, built and are responsible for maintaining and updating the system should be on call if something goes wrong.

In my experience, the devs were somewhere in the US west coast, and the SRE teams were geographically distributed to cover the 24 hour period during local daytime (nobody likes to be paged in the middle of the night). As an SRE in Zürich, I got paged in what was the middle of the night for the Kirkland people, dealt with the emergency (using the playbook), root-caused it (with the assistance of the playbook), and filed bugs to be looked at by the dev team when they woke up.

The systems stayed up, everyone could sleep at night, working as intended.



> and the SRE teams were geographically distributed to cover the 24 hour period during local daytime

Management problem number 1. These people should not be responsible for the running system.

> nobody likes to be paged in the middle of the night

Excellent motivation for the people that should be responsible for the running system to build quality software.


Why would randomly waking your engineers up in the middle of the night be an excellent motivation strategy?


It's an incentive to not release stuff that breaks in the middle of the night.

On the flip side that can lead to slower releases, or more expensive solutions




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: