Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> If you think a playbook is bad, try being oncall for the first time for a massively (Google-scale) distributed system without a playbook.

You are not Google scale. Don't invent a pen that writes in space when a pencil would do the trick.



I'd argue 100% automation is a space pen while run books are a pencil.


You do not need 100% automation. What you need is a systematic approach to handling problems followed by fixing the root cause.

Runbooks came from techops in broadcasting, power plant operations, etc where there was a clear division between operators who pushed buttons, ran cables, etc and those that made decisions about buttons to push and cables to run. Dumb hands + runbooks created "smart hands".

If your SRE runs like that it is not SRE.

Look at the incident handling:

1. Identify the issue

2. Implement a workaround to restore the service

3. Identify the root cause

4. Implement a fix for the root cause

5. Remove the workaround

Runbooks cover 1. and 2.


> You are not Google scale.

I don't know, I think I was kind of Google scale when working as a Google SRE.


1) https://www.transposit.com/ is not Google.

2) Your impact on Google scale was nearly 0 because you were one of a thousand SREs at Google.


There are only 500.


My bad


There’s no tolerance for rude and unsubstantiated comments here


Did you just assume my scale?


And check your metaphors before using them.

The pencil was more expensive.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: