loading...
Cover image for How I Stopped On Call Support

How I Stopped On Call Support

recursivefaults profile image Ryan Latta ・4 min read

For the first few jobs in my career, I lived with what many did, the dreaded on-call rotation. That is, for the uninitiated, when you do your typical workday, and then you remain available to handle support issues through the night. One particular job left me with several weeks of late-night calls, and I couldn’t stand it anymore. So here’s the story of how I stopped doing on-call support from then on.

Why On Call?

Let me go ahead and put something somewhat controversial out there. On-call support happens because we admit something terrible can go wrong, and we cannot wait to fix it. The less mature the systems and code, the more on-call hurts. The more mature and stable, the less it hurts.

Now I’m not saying it can be eliminated in its entirety, but it can turn into something where many people who handle on-call rarely receive calls and can go about living their life with little chance of interruption.

Want a quick way to gauge the maturity of the organization you’re interviewing? Ask them about on-call support and how many issues they receive.

Why Won’t The Damn Thing Run!?

Alright, so the story starts as a fairly large project launches to production for an alpha release.

It wasn’t going well.

We’d build features during the day, then a select few of us would then get calls throughout the night as the DevOps group tried to get it running in production. I was one of those select few.

My first phone call would come in around 11 PM, and after an hour or so, the call would end, and my second call would come in about 3-4 AM. After another hour or two, then I’d wake up at seven and be at work by 9.

This pattern kept up for several weeks, and the lack of sleep was taking its toll. I couldn’t understand why we didn’t do this during the workday. After all, getting into production by our deadline was the most important thing, so why were we doing it when we had the least help?

I was assured by leadership that this would end, things would get better, and this would never happen again. I had very little confidence in those platitudes. So I took matters into my own hands.

So I said one day, “I’m going to keep helping, but every time you call, I’m going to drink a Manhattan. You get to decide how drunk you want me in production.” The leadership said I was not allowed to do that, to which I retorted that I would do whatever I want on my time, and they have a choice not to call me or do this during the day.

That night I got my call, and I said, “Hang on.” I went over, made a drink, and returned. “I can’t believe you were serious,” someone on the call said. I resolved the issue, and in another hour, I got another call.

I said, “Hang on.”

That was the last call that night, and the next day I had to go into a closed room to talk about my antics. I made my point as clearly as I could that there are just a few of us burning out at night trying to get the most important thing done while all day long, we do nothing to help.

I was told my help wouldn’t be needed again.

Management created a dedicated room where people from many teams would work on this during the day. People rotated in and out, and trying to do this at night came to a close.

Then What?

I had a sort of epiphany about how companies and groups tend to treat supporting their products from then on. They treat it as a secondary concern to building new features.

How crazy is it to add features to a product that struggles to run as is?

As I left that company and went to others every time, someone brings up support. I list my rules.

  • After work hours, if my phone rings, it is my choice to answer it, and mine alone.
  • If I answer I expect my manager to sit on the phone the entire time
  • The next day my work will focus on preventing this issue from ever happening again

I get pushback when I ask managers to sit on the phone as, “There’s nothing I can do to help.” I tell them that I want them to sit there awake as long as I do so that when I come to work exhausted and say this has to stop, they feel its importance.

In the seven years since I started this, companies I’ve joined have changed the way they handle on-call due to my rules. I won’t say that on-call isn’t needed, but treating it as something different than the rest of the job isn’t healthy or sustainable. My rules force that issue out.

For Your Consideration

You may find my rules too strict, and that’s alright. Consider, though, how your group thinks about after-hour support and how what happens when people are asleep feeds back into your regular workday. Most of the time, it doesn’t. Is the support demand growing? What is changing to make it that way, and what can you do to turn that trend around.

If nothing else, my rules force people to look at their choices during the day that leads to suffering at night.

What rules would you help create to make on-call something rarely used because your group cares so much about stability?

Discussion

pic
Editor guide
Collapse
cchana profile image
Charanjit Chana

The idea of having someone 'on call' in the team I lead was floated but it never came to that. Perhaps we were lucky with the geographic location of our customers that they found issues during the day that we could resolve then and there for them.

But ultimately, we really improved our standing by introducing much better code coverage (long overdue), we had extensive regression tests which meant if we hadn't taken the responsibility during planning, development, testing or review then we could flag it as something that should be improved... did that always work? Not enough.

In a job long ago, we rotated and I only ever got called once about a project I knew nothing about! We managed to identify when we could roll back to and took care of it the next day.

Looking to the future and putting myself in the shoes of a manager/lead again, I'm thinking that this is a good strategy for giving focus to the issue. If these calls are rare, perhaps it's OK to carry on as you are but if this is a daily/weekly thing then it definitely needs addressing. I'll need to think about it some more, perhaps over a Manhattan or two, but it's something I will seriously consider implementing myself.

Collapse
bdunn313 profile image
Brad Dunn

This. Is. Awesome.
👏 👏 👏

Collapse
dmahely profile image
Doaa Mahely

How crazy is it to add features to a product that struggles to run as is?

Your approach is entirely reasonable. I'm glad it paid off 🙌

Collapse
taknil profile image
Hippopothomas

I do like your approach. One person on our dev team is tasked to prioritise incident resolution / operations over development along being on call for that week.
However I am worried about the wording "on my time". Is being on-call not honoured as pay / overtime / bonus ? A friends employer honours on-call nights with actual incidents with a day off to be taken within the next 14 days. Mine does as least pay on-call days extra.
In the country I reside in, employee protection laws require a certain period of rest (9 to 10 hours) in-between shifts. So If I get a call that is solved at 3 am, nobody can expect me in the office by 9am.

Collapse
recursivefaults profile image
Ryan Latta Author

The laws are different from country to country, also what kind of employee you are, and then whatever the companies themselves try to get away with.

Many places I know that build full on-call teams manage it a lot better than the cases when they throw people at it because things are going poorly.