The career of a software engineer is filled with and perhaps defined by troubleshooting and problem solving. Being an effective troubleshooter can make you an incredible developer, as your team will look to you to solve difficult bugs or fix issues. I believe advanced troubleshooting skills form part of what makes a senior engineer great, but I think you can start working on those skills from day one. Over the years, as I've transitioned to more of what people might call a senior developer, I've picked up more and more tips that work for me when troubleshooting, and in this post I will lay them out for you.
A trap that we all fall into from time to time is rushing on a task. Maybe a deadline is approaching or you're feeling behind. Maybe you're an energetic junior and you feel like you have something to prove. (Whatever the reason, we all get into situations where we get stuck, we fall behind, then we rush to catch up but the rushing itself means your logical and methodical mind takes its hands off the wheel and panic has full control. To be fair, we don't do ourselves any favours by calling our work "sprints", when really its more of a speed walk.
I genuinely believe a big skill of a software engineer is staying calm, collected and patient in the middle of a problem. Rushing causes mistakes, you miss things, and you really don't get anywhere any faster. I remember being a junior, and during sprint planning I would be assigned tasks that I didn't know exactly how to do and feeling very uncomfortable about it, a panic set in and I dreaded working on it. Now, years later, I feel much calmer in the face of unknowns. I say things like "I don't know, lets pair on it and figure it out" to colleagues regularly, and I feel much more at ease than those first few years.
So if you find yourself in problem solving mode, breathe, slow down and focus on the task. Usually there isn't a rush, and so use the time you have to work effectively.
This brings me to my second bit of advice - act logically. Software and codebases are systems, and generally speaking most systems have a design to them. Perhaps the code is arranged into components, portions with different responsibilities like managers, controllers, data persistance and view rendering. Looking at a higher level, you might be reasoning about a collection of codebases in a microservices architecture system.
Whatever the system or whatever level you're at, you'll have a unit that has inputs and outputs. When chasing bugs, its good to act logically and rule out areas of the system unit by unit. If the unit you're looking at has correct inputs and outputs, good, move onto the next one. What is this unit talking to and what is talking to it? Follow that chain, working methodically through the system. If you fail to act logically and instead just check the places you're familiar with or think that first might be the issue, you might get stuck pretty quickly, which doesn't help.
Over time in the same system, you'll understand the units more and how they interact. And after a while you'll be able to move quicker ruling out parts when an issue comes up, zeroing in on the location of the issue much faster. The corollary of this is if you want to learn a system fast, then go solve a weird bug in it. It will take you on quite the tour. Generally speaking the more you know about the codebase you spend your days in, the more effective developer you'll be, so don't stick to your bit, have a look around from time to time.
A huge mindset win for me the last couple of years has been to realise that I'm at a wall and stuck, and asking myself what can I do to give myself another clue. This could be running portions of the code in isolation, logging out values, testing various bits and bobs and generally having a good experiment and explore.
No one action is designed to be the fix, but a result of an action could spark an idea for what to try next. This sort of experimental, exploratory approach to problem solving has been really effective, at least for me. By way of example, I was recently troubleshooting a new colleague's inability to install some internal packages via npm. The error logs weren't particularly helpful to us, but some logs did indicate some of the substeps the npm script was doing. So, I thought perhaps running one of the substeps, a git command, in isolation could give us a clue. Perhaps npm was hiding some output or behaviour from git. We ran git pull on one of the private packages and suddenly we were asked for an SSH passphrase, something that npm had prevented from coming through. A clue! After realising that SSH wasn't quite set up properly and fixing it, npm install succeeded. We didn't know it was SSH from the logs, we just had to try and see. Just explore. Generate clues.
Over time, you build up a sort of clue generating toolbox, this metaphorical toolbox contains all the things you can do to find clues or poke at a system. They range from writing failing tests all the way to log messages to inspecting network traffic.
Trying to list my personal toolbox would be a tricky, as my toolbox is vast and is full of years of experience. Yours is probably bigger than you realise, too. A carpenter who has forgotten about his screwdrivers trying to use a hammer won't be very effective. Being aware of what is in your toolbox and adding to it over time will make you a much more effective developer.
In addition to avoiding rushing, during troubleshooting it can be very easy to only see failure. These depressing thoughts can be distressing. However, there is a sage bit of wisdom / programming meme to call on here: A new error is progress. Similar to finding clues as described above, sometimes you fix something and you see a different error. It's very easy to be upset by this, thinking the issue remains unsolved or you're making it worse. However, and more often than not, some fixes in the software world need multiple actions to resolve. As you solve one thing, the system will produce the next error for you. Be happy! The error is gone! You've finished that bit, but now there is another thing to solve.
I think the subtext of this meme in programmer culture is that problem solving is often more of a battle with yourself and your emotion than the system at hand. I think working on disciplining your emotions when problem solving is incredibly important. You might find yourself getting frustrated, feeling embarrassed, rushing, feeling upset, feeling incompetent, or feeling like an imposter. If this is you, I recommend thoroughly working on controlling these feelings. By knowing yourself and recognizing its happening to you so you can work through it and keep working logically is really important, and a skill you get better at over time. Even many years into my coding life, I still feel these things when I'm really stuck, but I try to snap out of it as soon as I can and stay focussed. It happens to the best of us, and it really is part of the journey of problem solving, and makes the feeling of getting to the solution a lot sweeter.
Finally, there will be some problems you're not going to be able to solve on your own. These are generally the problems where you have too many unknowns (either known or unknown) and you aren't able to follow the advice above. You can't move logically through a system, because you don't understand it, or you can't give yourself clues because you don't have enough clue generating techniques in your toolbox for the system at hand. More often than not, this happens to developers who are working on new systems, languages or technologies that they haven't worked before. Hopefully you are in a team setting where someone else, perhaps a longer serving employee or a team member with a different skill set is able to help you, they can point you in the right direction, or remove the fog that is in the way of your path to finding the solution, and you can start to troubleshoot effectively again.
If you're on your own, then I find the way out is to read and read. Go back to the basics, what are the basics of the system, what are the key concepts, and what do I not know about what I'm looking at. I've been in situations trying to debug some code where I've literally googled every line of code or configuration and reading the documentation about it. Somewhere along the way a clue generation idea came along, then a clue and then I was off again. It's definitely slower, but sticking to the tips above got me through. And soon, you may be able to help the next person looking for help.
In summary, here are my tips
- Slow down, you could be rushing
- Act logically, you could be missing parts of the system to check
- If you're at a wall, start exploring and see if you can find any clues
- Keep your thoughts positive, any progress is progress even if its not the end
- Know when to ask for help
Troubleshooting and problem solving is definitely a skill that builds up over your career, but you must consciously work on it. Watch other developers work, pair with them, ask "how did you think to do that?" or watch them use their clue generating toolbox and add what you see to your own. Finally, these skills aren't just helpful for the day job, they generalize to a lot of problems in life which you could now approach with ease. It might ruin escape rooms though, sorry.