Written by Dave Sweeton, Chief Technologist of Stout Systems
I do a lot of debugging in my work. Not only do I debug my own code, but I also help other developers with difficult-to-diagnose bugs.
Most bugs are pretty easy. You can look at the call stack of the error (if you have one), or look at the logs to find out hints about the state of the application. Hopefully you got a nice set of steps to reproduce the error, either from your own testing or from the tester who found the bug!
Some bugs are too stubborn to be found by normal means though. They don't leave any trace in the logs. The trace doesn't make sense. (You find yourself saying, “There's no way that could happen!”) They can't be reproduced by mere mortals, but Jenny in accounting can break it just by clicking a single button! You may even have a dreaded "Heisenbug" where the bug seems to disappear whenever you investigate it.
Common causes of impossible bugs are race conditions, where task B sometimes completes before task A and triggers a bug. Threading issues are also common, especially with shared data. Without shared data, most threading issues are some form of race condition.
When faced with a bug that's resisting attempts to reproduce, one question I like to ask is, "If I can't make it better, can I make it worse?" Sometimes just thinking about the ways you might make it worse can help spur ideas on how to find it.
How do you make something worse? Depends on the circumstances, but here are a few ideas:
- Increase the load, by adding more simultaneous users or using an automated stress testing tool.
- Add programmatic delays to the code to change the timing of operations.
- Change the code to repeat operations. If a button click triggers code that sometimes has a bug, make the button click call that code multiple times, maybe in a loop.
- Similarly you could automate running the buggy code (possibly with a unit/integration test) to run it for thousands of iterations. If you get too far from the "normal" application context though, you may not be able to reproduce the bug, but at least then you might have a hint that the bug is due to the context your test lacks.
- Reduce network bandwidth and/or increase the latency. You can do this easily in a browser (for a web application) with the browser's developer tools.
- Try a different environment. Sometimes different time zones, different cultures or different hardware can trigger strange bugs.
- Try asking the "Can I make it worse?" question next time you are faced with an impossible bug. Hopefully it will lead you to a Eureka! moment!
This is a technical/business article catered to developers, hiring/project managers, and other technical staff looking to improve their skills. Sign up to receive our articles in your email inbox.
Stout Systems is the software consulting and staffing company Fueled by the Most Powerful Technology Available: Human Intelligence®. We were founded in 1993 and are based in Ann Arbor, Michigan. We have clients across the U.S. in domains including engineering, scientific, manufacturing, education, marketing, entertainment, small business and robotics. We provide expert level software, Web and embedded systems development consulting and staffing services along with direct-hire technical recruiting and placements.