You just opened a pull request. Wrote a detailed description. Asked for reviews. The notification bell is ringing. It's your colleagues commenting "LGTM". Hit the merge button and 10 minutes later you get error notifications from your code. You broke production.
That starts a cycle with most of your code submissions. You're told by the impostor syndrome you're not qualified for the job. You think you're not as good as other software engineers. Other developers (mostly Twitter stars) seem error-proof.
Let me tell you something you already know: everybody makes mistakes. Senior engineers, junior engineers, CTOs. The difference between a good and an average professional is how you deal with the incident. Good engineers do as follow:
- Don't try to find the culprit but focus on solving the issue
- Know how to differentiate a critical from a minor problem
- Don't fear asking for help from other developers
- Warn the team as soon as they notice a production issue (even if they caused it)
- They're proactive in documenting post-mortem in written and searchable form for future reference. This includes technical details and potential business outcomes from the incident
You don't need to blame yourself. Production incidents don't happen just because of someone's code. It is a chain of causes that trigger a problem. And these causes aren't technical only. Good engineering environments do as follow:
- Strong testing culture (automated and manual)
- CI and CD tools integrated with code submissions
- Error monitoring tools with real-time notifications
- Healthy code review culture
- Good communication culture and tools
- Expect errors to happen and people to make mistakes
Organizations and teams must be expecting incidents to happen and builds a strong and healthy culture inside the engineering environment. This does not mean that incidents should happen often, but if they're happening, it's an opportunity for the organization to learn how to prepare better when it comes.