DEV Community

Discussion on: Have you ever been on-call? What was it like?

Collapse
 
nevon profile image
Tommy Brunn

I have been on call, and for any system where the impact of being unavailable outside of office hours are significant enough to justify the cost, I would advocate for always having an on-call rotation. There are really only two alternatives, and you can probably guess which one businesses would choose given the options.

  1. The service goes down on Friday night and doesn't come up again until Monday when people are coming back to work.
  2. Some poor developer gets called on their personal phone and guilt-tripped into spending their Friday night doing unpaid work to get the service back up. Most likely they won't be compensated beyond perhaps equal time off and maybe a free lunch.

If you're having an on-call rotation though, it's important that the developers themselves have not just the responsibility for the system health, but also have the resources and mandate to ensure it. For example, if I get a false alarm on a Saturday, on Monday my top priority is going to be improving the alerting to make sure that that doesn't happen again.

It goes without saying that you should also compensate the developers for their time. At a previous workplace, we had what I considered a reasonable compensation where you got paid an hourly amount based on your salary any time you were on call, plus a significantly higher hourly amount any time there was a turnout outside of office hours, and a day off after being on-call for 7 days (a legal requirement in my country). I like this structure because it actually aligns incentives where I don't want to get woken up by alarms in the middle of the night, so I'm going to build my services to be resilient, and the company will encourage me to do that because they would rather not pay me a fairly substantial amount of money if we have constant on-call turnouts.

Collapse
 
shelbyspees profile image
shelby spees (she/her)

it's important that the developers themselves have not just the responsibility for the system health, but also have the resources and mandate to ensure it.

100% this! I appreciate this post by Charity Majors (full disclosure, she's CTO at my company) on how managers can make on-call more humane by empowering devs to actually fix things.