DEV Community

Discussion on: How do you deal with incidents?

Collapse
 
hunterpp profile image
Hunter Peress

Wow!! Never dealt with anything in the two week scale...scary! Most of my incidents were solvable in a day. I can remember 3 all nighters I had to handle over 5 years. Def slept in after those 😂😉 But Im a fan of continually improving, making the system more resilient, improving communication, and getting to root causes. Glad you like messes!!

Thread Thread
 
alanmbarr profile image
Alan Barr

Yeah it was only one all-nighter fortunately and rotating shifts with many different roles. Lots of frustration because many people depended on it even if it wasn't the best at what it does. It hasn't reappeared but it wasn't a clear root cause either besides make sure we have xyz VM storage settings just in case because this tech has a certain storage and processing story. I'm excited for this new Kubernetes world because resiliency and observability are more accessible but I'm concerned about new strange problems.