How does your team handle 'on call' evening/weekend hours?

jess profile image Jess Lee Aug 10, 2017

As Ben pointed out in his post, "I've been de facto on call 24 hours a day since starting dev.to but this weekend I'm going camping", we clearly don't have a fair schedule for handling late night or weekend emergencies, yet.

I know some teams rotate weekend duty and get a weekday off here or there, but I'd really like some real world recommendations or what to or not to do.

markdown cheatsheet

Our team runs a Tuesday to Tuesday weekly schedule for a primary and secondary on-call person. We have around 8 people on call, so we get a 2 week break in-between.

If we have something important on while on-call, other team members have been great at picking those days up.

It's also important to add, we are paid for being on call, whether we get a call or not.

If someone is on vacation, they are on vacation, no on-call duty for them :)

Well I work remotely. It really depends on what the actual issue is, as our teams are split into different sub categories of what we do. Generally if a machine fails and stop sending data we'll all get a notification to our phone saying its gone down, after 10 minutes if no one has responded and more start to go down an alarm goes off from my phone that screams alert in my face. If the time comes and we can't be on for this we're able to set it so it wont alert us at all, but only if we're not near our computer.

My team had a frank discussion with management about overtime and management decided that it could affort 10 hours of overtime/dev/year. This amounts to "we fix problems during regular business hours".

That's ok for us, because we're the government and our publicly facing software generally serves a group of people who have 10 days to get us the information. Assuming they don't wait until day 9 on a Saturday, slow weekend response is ok.

We do pay for a 24 hour help desk with a binder full of answers to common questions that people can call, which reduces the need for devs.

We talked about what we would do if 24/7, 365 responsiveness ever became necessary. Our feelings were that having enough devs to do 3 shifts, a la real 24 hour factory operations were probably the optimal solution, but failing the budget for that, rotating 2 weeks on, 2 weeks off on call duty seemed next best.

What team do you have? in my office, critical moments like that might happen when the server is down or there are bugs that make a site not running properly. Usually, all members of the team are always on hand when things happen like that. And when someone was on vacation, there will always be other team members who willing to replace.

We believe that a vacation is a necessity and will make someone become more productive, and when there are members of our team who were on vacation, as much as possible we will not interfere with him/her. so take your 'me time' :)

Here are a few recommendations that have worked well in environments I've worked in the past :

  • Big +1 on what @val_baca said below. Determine alternate weeks and definitely have an on-call calendar that everyone's dates and rotations are on.
  • Documentation of critical systems and troubleshooting steps to get things back to good working state are key. This has saved hours of troubleshooting and downtime across multiple teams I've been in. Equally beneficial is person who's out can relax a lot more knowing things are being handled in a somewhat organized fashion in case of emergency, and they won't come back to a worse fire than the original one.
  • List out the major holidays/travel times of the year that people usually want to be with friends and family, e.g. Christmas, New Year's, etc. Figure out who's going to cover each one. It's often the worst feeling when everyone on a team has unchangeable plans over a long weekend and someone gets stuck with being on-call at the last minute.
  • Track on-call coverage data and adjust accordingly. Equitable scheduling should ideally balance out all incidents, but if a team deploys heavily at the beginning of every month, and the same person is always on-call during that time, chances are they'll get a lot more calls during off-hours than everyone else in the rotation. Assess who's getting "on-call" burn-out, and swap around the responsibilities for a few weeks to help them recover.

There's no magic recipe I guess, the trick is to have a big team working on rotation. There have been times I used to support two technologies handling issues for both on the same time ๐Ÿ˜

Managers at my workplace are on a rotating weekly schedule with just one individual on call after-hours. Person in possession of the on call cell phone is responsible for triaging issues. If something can wait until the next business day, then a normal support ticket gets created on behalf of the caller with the incident particulars. On the other hand, if the matter is a true emergency, then our internal knowledgebase contains a series of "who to call if X happens" lists to get the correct non-management people in play.

The call tree with "who to call if X happens" is very helpful for incident response!

Same as many here. On call roster is 1 week shifts of being secondary followed by 1 week shift of primary followed by a week off. It is really easy to manage via PagerDuty. They have calendar integrations and an easy way to manage overrides for particular days. It is important to have levels of on call so that people know that in an emergency there are other people to rely on as well.

We all switch around weekends a lot when people have different events to go to. I think its best to allow flexibility but keep a solid schedule.

Also, pay people for the days they are on-call. These are extra hours that they have to work and they deserve compensation for it. It also makes it easier to switch out days since there is an incentive.

We have just started our on-call shift for about 2 weeks ? Previously, I don't agree on having on-call shift, preferring "mutual responsibility" approach. But now I started to feel it's bad for everyone, as that mean everyone can't "disconnect" from work at all time. So We discussed how the on-call shift should look like - 2 days per shift, or one week shift etc. We're a team of 10 but only 5 being put on call for now. In the end we decided for 2 days shift as 1 week seem too long for 1 person to take on.

We have a python script that printing the on-call roster for the following week but plan to have one month schedule ahead, as that seem easier to plan your day when you know what day you'll be on-call for the month.

p/s: I'm currently on-call, that's why I came here ๐Ÿ˜

I'm currently on a team that supports a critical government app responsible for people's pay cheques. We have a phone that the on call person carries and only get calls from the help desk when they get a call. We do two weeks on and have anywhere from 2 to 5 people on rotation depending on the current team size. There is extra pay for "standby" and also hourly payment for doing work outside of business hours. It works well and all the devs are onboard.

At one of my last companies we used to get an extra days pay in our pay checks every month and we would rotate being "on call" every weekend it would be someone else's turn. At my last company we used to substitute weekend work days for PTO so if you worked weekends for a few weeks straight you could save up days and get an extra vacation. One of our team members took like a month off with saved up PTO (Paid time off)

Every dev goes on-call 24/7 for one-week.

Every dev is on-call one week every N weeks, where N = devs on team.

Exceptions are made as necessary, usually just one-day swaps but occasionally full-week swaps are necessary.

I don't have anything to add to the conversation, just wanted to point you in the direction of this publication which has an entire issue on "on call": increment.com/on-call/

We have no real system for this at my work. It's whoever is available, willing, and most capable that usually gets called.

I mainly sleep during my shift mainly I don't have work lol