DEV Community

Cover image for On-Call 101: How to begin
Amburi Roy
Amburi Roy

Posted on • Updated on

On-Call 101: How to begin

Hi there! A few years back, I started doing on-call shifts to earn some extra euros. Since then, I've been working as a rotating "on-caller". I know that initially, being on-call can appear to be quite an overwhelming responsibility. Whatever your motivation, it's completely fair to feel daunted by the scary night call stories. Nevertheless, if you perceive it as a challenge, it's wise to be ready for it.

Here're 5 steps to begin your on-call journey:

  1. Know On-call Schedule and Protocol for Responding to Alerts
  2. Get access
  3. Play with tools
  4. Know the architecture of your system
  5. Keep your eyes and ears open

1) Know On-call Schedule and Protocol for Responding to Alerts

One of the primary and most crucial aspects of being on-call is having a clear understanding of the scheduling system and the protocol for responding to alerts. On-call duty requires being available for extra work hours as needed, and it can be quite a demanding responsibility. Hence, it's not recommended to stay on-call continuously for a long period; a rotation is necessary. To ensure a clear process, having a predefined system is necessary. In my workplace, we use a third-party service called PagerDuty to manage our on-call schedules and notify us via phone calls. If you use a similar system, ensure that your phone/email/contact details are kept current in the system. If you're expected to receive notifications by phone, then make sure the phone number is saved and marked as an emergency contact. Keep your phone's notifications on so you can be reached if needed. Be aware of your turn and adjust your calendar and appointments accordingly.

2) Get access

Once you grasp how scheduling operates, let's delve a bit deeper. You need access! Access to the system falls under your responsibility as the on-call personnel. Get in touch with the appropriate person to attain access, and I suggest maintaining an access book โ€“ this entails storing URLs alongside usernames and password hints (writing passwords directly isn't a safe idea). When I began as an on-caller, I wasn't aware of our New Relic dashboard for monitoring servers. However, after some time, I struggled to recall the URL and username during an ongoing incident. Therefore, be ready with bookmarks โ€“ bookmark useful links.

3) Play with tools

Whenever you gain access, start playing/using those tools and dashboards to familiarise yourself with them. Learn how they function by reading tutorials, watching videos, experimenting with the tools, and understanding how it works. When I began my on-call, I wasn't familiar with how Grafana works either. Once I received an on-call because a server was down, I spent 2 hours before I realize there was nothing wrong with the code, the issue is somewhere else. So to be prompt, know your tools. Started reading about tools as well.

4) Know the architecture of your system

You now have access to all the tools and servers. Before you begin tinkering, just be aware that you're toying with something serious - production. If you make a mistake, it could impact your end users. Remember, everyone makes mistakes; that's natural and okay. To avoid making frequent or repeated mistakes, it's crucial to comprehend the overall architecture of the system. Understand what goes where, and why it's there! Try painting a clear picture of the design and making sense of it, like docker, Kubernetes, Microservices, RestApis, etc.

5) Keep your eyes and ears open

Even if you're not on-call, keep yourself engaged and stay informed about what's happening. What deployments have been carried out? Have there been any architectural changes? It's important to be aware of the alterations made to the system. I once encountered an issue caused by a server settings change that had taken place 2 months earlier.


Wrap-Up!

Once you're acquainted with the scheduling and response protocol, and you have access to and understanding of the system architecture, then you become familiar with the system. At this point, you can consider yourself prepared for on-call duty.

Numerous people have had challenging experiences while on-call. I, too, have often felt doubtful and uncertain during ongoing incidents. These stories might sound discouraging, but I must emphasize that they undoubtedly provide you with a great opportunity to thoroughly grasp the software lifecycle and the entire system architecture. I would strongly advise embracing on-call duty as a challenge!

Thanks for reading! ๐Ÿค—

Top comments (2)

Collapse
 
phylis profile image
Phylis Jepchumba

Hey Destructive Mind,

Welcome to the community! ๐ŸŽ‰ It's fantastic to have you here and to see you sharing your insights right off the bat. "How to begin your on-call journey" sounds like a really intriguing topic, and I can't wait to dive into your post. Looking forward to more of your contributions ๐Ÿ˜Š.

Collapse
 
amburi profile image
Amburi Roy

Hi Phylis, thank you for warm welcome! ๐Ÿ˜Š