It is a capital mistake to theorize before one has data.
Observing the failure firsthand is essential. When you make assumptions about the problem's cause, you often end up trying to fix something unrelated to the actual bug. This not only results in an ineffective solution but also consumes valuable time and resources, possibly causing additional issues. Avoid this approach.
Instead, stop overthinking and start observing. This is among the most valuable advice you can give to anyone dealing with debugging.
Engineers are inherently analytical thinkers. They enjoy the intellectual challenge, which is why they chose engineering over physical labor. While engineers generate ingenious ideas, there are more ways for things to go wrong than even the most imaginative engineer can anticipate. So, why do we believe we can solve problems purely through thought? Because we are engineers, and thinking is more convenient than observing.
Observing is demanding. It is often, if not always, more complex than we would prefer. In the realm of software, observing entails setting breakpoints, inserting debug statements, monitoring program variables, and inspecting memory.
After you have dis-proven your initial assumptions, you are still left with the task of identifying the bug. You end up with the same workload as before, but now you have less time. This is far from ideal unless you subscribe to the notion that the sooner you fall behind, the more time you have to catch up. Therefore, here are some guidelines to help you prioritize observation over premature speculation.
It appears self-evident that to identify a failure, one must witness the failure occurring firsthand. When we observe a bug, what we are actually seeing is the aftermath of the failure. To effectively debug, it is imperative to scrutinize the failure in intricate detail. Many issues can be easily misconstrued if you cannot observe the entire sequence of events as they unfold. Without a comprehensive view, you may inadvertently address a problem you've merely speculated about, when in reality, a completely different element has malfunctioned. The true nature of the problem becomes evident only when someone observes it in action.
Ensure that you have a clear understanding of the precise issue at hand. In most cases, the act of observation is significantly faster than relying on hasty guesswork shortcuts, as these shortcuts often lead to dead ends.
The extent of observation needed is usually quite limited. In most cases, with each instance of examining the system to observe the failure, you gain additional insights into the nature of the malfunction. This enables you to determine where to delve deeper to acquire more detailed information. Gradually, you accumulate sufficient details to justify an examination of the system's design, aiming to pinpoint the root cause of the issue.
When should you transition from observation to analysis? Continue the process of observation until the visible failure narrows down to a manageable number of potential causes that warrant further examination.
There's something called "The Streetlight effect". It's a metaphor for knowledge and ignorance. The streetlight effect, or the drunkard's search principle, is a type of observational bias that occurs when people only search for something where it is easiest to look.
In the realm of software development, the initial layer of built-in instrumentation typically involves compiling the code in debug mode, enabling you to observe the program's execution using a source code debugger.
Having a higher volume of status messages is beneficial, but it's essential to incorporate a mechanism that allows you to selectively enable or disable specific messages or message types. This flexibility enables you to focus your attention on the messages relevant to diagnosing a particular issue.
It's crucial to incorporate debugging considerations right from the outset of the design process. Ensure that instrumentation is a fundamental component of your product requirements. Incorporate hooks for instrumentation in every functional specification and API definition. Include the debug monitor and analysis filter as integral components of your standard utility toolkit.
Furthermore, aside from simplifying the eventual debugging process, contemplating the instrumentation needs also aids in designing the system more effectively and mitigating some of the potential bugs from arising in the first place.
If you weren't able to incorporate instrumentation during the initial development, at the very least, consider adding it afterward. Utilize a debugger to gain an internal perspective on your code.
If a bug is present in the code, you'll eventually need to recompile the software to rectify it. Consequently, you should also be open to recompiling the software to initially identify the bug. Create a debug version that allows you to access the source code and incorporate new debugging statements to inspect the crucial parameters you need to examine.
Heisenberg emerged as one of the trailblazers in the field of quantum physics. While delving into the intricate world of extremely light and minuscule atomic particles, he recognized a fundamental principle: when dealing with these particles, you can either measure their precise location or their trajectory, but the more accurately you determine one of these aspects, the more you perturb the other.
A HeisenBug is a glitch that exhibits a peculiar characteristic—it responds to the act of observation. For instance, it might mysteriously vanish when the program is in debug mode.
The challenge arises from the fact that obtaining an accurate measurement is hindered because the tools used for observation are an integral part of the system. Your testing instruments inherently impact the system undergoing examination.
Even a debugger can introduce some degree of timing variation, and any form of instrumentation, to varying extents, influences the system's behavior. This is an unavoidable reality, and it's crucial to bear this in mind so that you're not caught off guard by these effects. Additionally, some methods of instrumentation are less intrusive than others.
Remarkably, even minor alterations can perturb the system enough to obscure the bug entirely. Instrumentation is one such alteration. Therefore, after implementing instrumentation in a malfunctioning system, it's imperative to recreate the failure to ascertain that Heisenberg's principle is not inadvertently confounding your efforts.
Guessing can be a valuable tool, particularly when you have a strong grasp of the system. Your educated guesses might even come close to the mark. However, it's crucial to use guesswork as a means to narrow down your search. You must still validate your assumptions by witnessing the failure before attempting to rectify it.
When you encounter an unexpected bug, it's essential to reevaluate the certainties you hold dear. The degree of surprise you experience when something goes awry is directly linked to the level of trust and confidence you place in the code that's running. Therefore, when confronted with an "astonishing" failure, it's imperative to recognize that one or more of your foundational assumptions are incorrect.
In the face of an unforeseen bug, your task extends beyond mere resolution; you must also investigate why this issue went unnoticed until now. Ensure that whatever transpired, you have mechanisms in place to detect it if it occurs again.
Furthermore, if the bug stems from misconceptions held by a team member, it's essential to engage in a collective discussion about the problem. If one person misunderstands, it's possible that others share the same misconception.
Hence, it's advisable not to place excessive trust in your initial guesses; often, they can lead you down the wrong path and prove to be significantly off the mark. If meticulous instrumentation fails to corroborate a particular assumption, it's time to step back and reconsider your guesswork.