Shai Almog

Posted on Feb 21, 2024 • Originally published at debugagent.com

Unleashing the Power of Git Bisect

#tutorial #git #github #gitlab

The Essence of Debugging with Git
Setting the Stage for Debugging
Advanced Usage and Tips
Unraveling a Regression Mystery
Final Word

We don't usually think of Git as a debugging tool. Surprisingly, Git shines not just as a version control system but also as a potent debugging ally when dealing with the tricky matter of regressions.

As a side note, if you like the content of this and the other posts in this series check out my Debugging book that covers this subject. If you have friends that are learning to code I'd appreciate a reference to my Java Basics book. If you want to get back to Java after a while check out my Java 8 to 21 book.

The Essence of Debugging with Git

Before we tap into the advanced aspects of git bisect, it's essential to understand its foundational premise. Git is known for tracking changes and managing code history, but the git bisect tool is a hidden gem for regression detection. Regressions are distinct from generic bugs, they signify a backward step in functionality—where something that once worked flawlessly now fails. Pinpointing the exact change causing a regression can be akin to finding a needle in a haystack, particularly in extensive codebases with long commit histories.

Traditionally, developers would employ a manual, binary search strategy—checking out different versions, testing them, and narrowing down the search scope. This method, while effective, is painstakingly slow and error-prone. Git bisect automates this search, transforming what used to be a marathon into a swift sprint.

Setting the Stage for Debugging

Imagine you're working on a project, and recent reports indicate a newly introduced bug affecting the functionality of a feature that previously worked flawlessly. You suspect a regression but are unsure which commit introduced the issue among the hundreds made since the last stable version.

Initiating Bisect Mode

To start, you'll enter bisect mode in your terminal within the project's Git repository:

git bisect start

This command signals Git to prepare for the bisect process.

Marking the Known Good Revision

Next, you identify a commit where the feature functioned correctly, often a commit tagged with a release number or dated before the issue was reported. Mark this commit as "good":

git bisect good a1b2c3d

Here, a1b2c3d represents the hash of the known good commit.

Marking the Known Bad Revision

Similarly, you mark the current version or a specific commit where the bug is present as "bad":

git bisect bad z9y8x7w

z9y8x7w is the hash of the bad commit, typically the latest commit in the repository where the issue is observed.

Bisecting to Find the Culprit

Upon marking the good and bad commits, Git automatically jumps to a commit roughly in the middle of the two and waits for you to test this revision. After testing (manually or with a script), you inform Git of the result:

If the issue is present: git bisect bad
If the issue is not present: git bisect good

Git then continues to narrow down the range, selecting a new commit to test based on your feedback.

Expected Output

After several iterations, Git will isolate the problematic commit, displaying a message similar to:

Bisecting: 0 revisions left to test after this (roughly 3 steps)
[abcdef1234567890] Commit message of the problematic commit

Reset and Analysis

Once the offending commit is identified, you conclude the bisect session to return your repository to its initial state:

git bisect reset

Notice that bisect isn't linear. Bisect doesn't scan through the revisions in a sequential manner. Based on the good and bad markers, Git automatically selects a commit approximately in the middle of the range for testing (e.g., commit #6 in the following diagram). This is where the non-linear, binary search pattern starts, as Git divides the search space in half instead of examining each commit sequentially. This means fewer revisions get scanned and the process is faster.

Advanced Usage and Tips

The magic of git bisect lies in its ability to automate the binary search algorithm within your repository, systematically halving the search space until the rogue commit is identified.

Git bisect offers a powerful avenue for debugging, especially for identifying regressions in a complex codebase. To elevate your use of this tool, consider delving into more advanced techniques and strategies. These tips not only enhance your debugging efficiency but also provide practical solutions to common challenges encountered during the bisecting process.

Script Automation for Precision and Efficiency

Automating the bisect process with a script is a game-changer, significantly reducing manual effort and minimizing the risk of human error. This script should ideally perform a quick test that directly targets the regression, returning an exit code based on the test's outcome.

Example: Imagine you're debugging a regression where a web application's login feature breaks. You could write a script that attempts to log in using a test account and checks if the login succeeds. The script might look something like this in a simplified form:

#!/bin/bash
# Attempt to log in and check for success
if curl -s http://yourapplication/login -d "username=test&password=test" | grep -q "Welcome"; then
  exit 0 # Login succeeded, mark this commit as good
else
  exit 1 # Login failed, mark this commit as bad
fi

By passing this script to git bisect run, Git automatically executes it at each step of the bisect process, effectively automating the regression hunt.

Handling Flaky Tests with Strategy

Flaky tests, which sometimes pass and sometimes fail under the same conditions, can complicate the bisecting process. To mitigate this, your automation script can include logic to rerun tests a certain number of times or to apply more sophisticated checks to differentiate between a true regression and a flaky failure.

Example: Suppose you have a test that's known to be flaky. You could adjust your script to run the test multiple times, considering the commit "bad" only if the test fails consistently:

#!/bin/bash
# Run the flaky test three times
success_count=0
for i in {1..3}; do
  if ./run_flaky_test.sh; then
    ((success_count++))
  fi
done

# If the test succeeds twice or more, consider it a pass
if [ "$success_count" -ge 2 ]; then
  exit 0
else
  exit 1
fi

This approach reduces the chances that a flaky test will lead to incorrect bisect results.

Skipping Commits with Care

Sometimes, you'll encounter commits that cannot be tested due to reasons like broken builds or incomplete features. git bisect skip is invaluable here, allowing you to bypass these commits. However, use this command judiciously to ensure it doesn't obscure the true source of the regression.

Example: If you know that commits related to database migrations temporarily break the application, you can skip testing those commits. During the bisect session, when Git lands on a commit you wish to skip, you would manually issue:

git bisect skip

This tells Git to exclude the current commit from the search and adjust its calculations accordingly. It's essential to only skip commits when absolutely necessary, as skipping too many can interfere with the accuracy of the bisect process.

These advanced strategies enhance the utility of git bisect in your debugging toolkit. By automating the regression testing process, handling flaky tests intelligently, and knowing when to skip untestable commits, you can make the most out of git bisect for efficient and accurate debugging. Remember, the goal is not just to find the commit where the regression was introduced but to do so in the most time-efficient manner possible. With these tips and examples, you're well-equipped to tackle even the most elusive regressions in your projects.

Unraveling a Regression Mystery

In the past we got to use git bisect, when working on a large-scale web application. After a routine update, users began reporting a critical feature failure: the application's payment gateway stopped processing transactions correctly, leading to a significant business impact.

We knew the feature worked in the last release but had no idea which of the hundreds of recent commits introduced the bug. Manually testing each commit was out of the question due to time constraints and the complexity of the setup required for each test.

Enter git bisect. The team started by identifying a "good" commit where the payment gateway functioned correctly and a "bad" commit where the issue was observed. We then crafted a simple test script that would simulate a transaction and check if it succeeded.

By running git bisect start, followed by marking the known good and bad commits, and executing the script with git bisect run, we set off on an automated process that identified the faulty commit. Git efficiently navigated through the commits, automatically running the test script on each step. In a matter of minutes, git bisect pinpointed the culprit: a seemingly innocuous change to the transaction logging mechanism that inadvertently broke the payment processing logic.

Armed with this knowledge we reverted the problematic change, restoring the payment gateway's functionality and averting further business disruption. This experience not only resolved the immediate issue but also transformed our approach to debugging, making git bisect a go-to tool in our arsenal.

Final Word

The story of the payment gateway regression is just one example of how git bisect can be a lifesaver in the complex world of software development. By automating the tedious process of regression hunting, git bisect not only saves precious time but also brings a high degree of precision to the debugging process.

As developers continue to navigate the challenges of maintaining and improving complex codebases, tools like git bisect underscore the importance of leveraging technology to work smarter, not harder. Whether you're dealing with a mysterious regression or simply want to refine your debugging strategies, git bisect offers a powerful, yet underappreciated, solution to swiftly and accurately identify the source of regressions. Remember, the next time you're faced with a regression, git bisect might just be the debugging partner you need to uncover the truth hidden within your commit history.

DEV Community