Bugs here, bugs there, bugs everywhere. Some are easy to fix, some are complex and obscure. So, is there a generic approach to fixing every bug? Probably not.
Anyway, there are four parts of fixing a bug:
- Understanding a bug;
- Identifying the cause of a bug;
- Planning a bug fix;
- Applying a bug fix.
It is crucial to have as much information as possible about a bug, even if some of that information ends up irrelevant. For example, knowing at what time which user on which platform with which input got which output. You should almost always have all the required information already written in the bug report and if you constantly end up asking this question, then consider writing a list of required information for reporting a bug.
Sometimes features end up reported as bugs. And that's okay, as developers don't know every part of the application code the same way bug reporters might not know about some feature. So it's important to have the expected flow explained in the bug report.
As mentioned above, not everyone knows everything and features might end up reported as bugs. So, if you think a bug needs to be challenged, challenge it.
If you know when a bug happened, are there any logs that might help you? Remember to consider all log types: system logs (example: OOM happened), application logs (example: required third party service was unavailable), third party logs (example: unoptimized database query resulted in slow query execution), etc.
Try to reproduce the bug in your local environment. After you reproduce it, turn on a debugger and try to figure out where the bug is in the code. If needed, write down suspicious points and ponder over them until you figure out where the bug is.
If you can't reproduce the bug in your local environment, then it might be time to try the production environment. Based on identified suspicious points, use remote debugging or write the information you need in the application logs (example: "If logged user is me, then the log information I need (input data, execution time, etc.)"). After you have that production information, try to reproduce it locally again. And finally, if you didn't use remote debugging, remove added temporary logging code from production.
- Talk to a duck
Explain the bug and all the steps and findings to a non-person. Just by explaining, some ideas might pop up.
- Talk to other developers
Explain the bug and all the steps and findings to other developers. Maybe you will get some advice or tips on what to check.
- Give up (temporarily)
Add detailed logging to all identified suspicious points (be careful not to impact the application performance), write down all your findings, deploy code with detailed logging to production and wait for the bug to occur again.
Explain to your client that you are unable to reproduce the bug and that you have added detailed logging so you will know when the bug occurs next time and that you will try to fix it again at that time, with new information you will have.
With new information, ask yourself if there is a need to challenge the requested flow. Maybe there is a reason the flow behaves the way it behaves. Maybe the problem is wrong communication to the end-user. Maybe there are security, performance, or some other kind of issues that might occur if that flow is changed. Always check code history where the bug occurs, maybe there are some linked feature requests or explanations for it.
It's good practice to check how many entities were affected by a bug. It might lead to different bug solutions or maybe the client might want to know that information.
If you can reproduce a bug with a test, plan to write the test.
If you need to fix a bug ASAP, then plan to create a hotfix.
If some data is corrupted because of the bug, analyze the impact and plan the solution for data corruption. For example, you might have to create a background process that will fix all corrupted data, or maybe you might need some manual intervention to fix corrupted data.
If you have added some detailed logging to better identify the bug, maybe you need to remove it.
- If needed, create and deploy a hotfix. A hotfix can be ugly, but, ideally, it should be accompanied by a test that confirms that hotfix works.
- Fix the bug according to your application coding standards and, if possible, write some tests for it. If you added some logging that is not needed anymore, remove it.
- If corrupted data exists, fix corrupted data.
These are the steps I often go through when I work on solving bugs. Most of the time there is no need to go through all these steps, but hey, here they are.
And there is still one important thing to mention:
When you have all the information about the bug, somebody might need an estimate on how long it takes to solve that bug. That's easy when the bug is obvious and (probably) easy to solve, but how about when it isn't? Here are my rule-of-thumb rules:
- Create a task for identifying the cause of the bug and estimate it to a maximum of one day. If you can't reproduce a bug in one day, give up (temporarily) and create a follow-up ticket for a time when the bug is detected again.
- If you identify the cause of the bug within one day and you can fix it in the remaining part of the day, fix that bug.
- If you can't easily fix the bug, create follow-up tasks based on your plan for a bug fix.
And that's it, I hope this article helps you in your future bug fixing ventures.