DEV Community

Louis Liu
Louis Liu

Posted on • Edited on

Understanding Git Diff Results

I've often found the numbers displayed when using git diff to compare two files confusing. What exactly do those numbers represent? Recently, while working on a project to gather code review statistics data from GitHub pull requests, I realized that understanding these numbers is crucial. Line of Code (LOC) information, for instance, is something I wanted to retrieve. Initially, I struggled to obtain this data through the GitHub API. However, I suspected it might be related to the numbers in git diff, prompting me to delve deeper into analyzing code patches within pull requests.

Diff result

Each set of changes displayed in a git diff is referred to as a "Hunk." This concept isn't unique to Git. Let's take an example from one of my projects, where I generated the output by comparing two commits using the command line. You can obtain the same result on GitHub.

index f658fec..9e7f48e 100644
--- a/src/models/event-data.ts
+++ b/src/models/event-data.ts
@@ -6,10 +6,16 @@ import { EventDB, EventObject } from "../types/ranking-board"
 export class EventData {
   context: Context;
   db: EventDB;
+  dataFilePath: string;

   constructor(context: Context) {
     this.context = context;
     this.db = { ranking: [] };
+    this.dataFilePath = process.env.DATA_FILE_PATH || '';
+
+    if (this.dataFilePath == null) {
+      throw new Error('DATA_FILE_PATH is missing in the environment variable.')
+    }
   }

   async load(context?: Context) {
@@ -18,11 +24,10 @@ export class EventData {
     }

     const repo = new Repo(context as any);
-    const dataFilePath = 'data/ranking.json';

-    let contentResponse: OctokitResponse<any, number> = await repo.getContent(dataFilePath)
+    let contentResponse: OctokitResponse<any, number> = await repo.getContent(this.dataFilePath)
     let buffer = Buffer.from(contentResponse.data.content, 'base64');
-    let data = buffer.toString('ascii');
+    let data = buffer.toString('utf-8');

     this.db = JSON.parse(data);

@@ -45,6 +50,28 @@ export class EventData {
     console.log('type: ', eo.type);
     console.log('will save eo to data.json');
     console.log('>>>>> db is looks like:', this.db);
+
+    let message = `rank: ${eo.receiver} -> ${eo.points} point(s)`;
+
+    this.sync(message, 'main');
+  }
+
+  async sync(message: string, branch: string = 'main', context?: Context) {
+    if (context == null) {
+      context = this.context
+    }
+
+    const content = JSON.stringify(this.db);
+    const repo = new Repo(context as any);
+    const currentCommit = await repo.getCurrentCommit(branch);
+    const fileBlob = await repo.createBlob(content, 'utf-8');
+    const pathsForBlobs = [this.dataFilePath];
+    const newTree = await repo.createNewTree([fileBlob], pathsForBlobs, currentCommit.treeSha);
+    const newCommit = await repo.createCommit(message, newTree.sha, currentCommit.commitSha);
+
+    await repo.updateRef(branch, newCommit.data.sha);
+
+    console.log('database sync done.');
   }

   private add(eo: EventObject) {
Enter fullscreen mode Exit fullscreen mode

Let's start from the header.

diff --git a/src/models/event-data.ts b/src/models/event-data.ts
index f658fec..9e7f48e 100644
Enter fullscreen mode Exit fullscreen mode

The first two lines tell us the diff format is --git and the file being compared. The git hashes (f658fec..9e7f48e) of the two files are going after it, and the file permissions.

--- a/src/models/event-data.ts
+++ b/src/models/event-data.ts
Enter fullscreen mode Exit fullscreen mode

The next two lines indicate the file name again with symbols. The base file (---) is on the top and the compare file (+++) is on the bottom. All the lines that exist in the base file but do not exist in the compare file are decorated with a -, these lines are usually displayed in red. All the lines that do not exist in the base file but exist in the compare file are decorated with a +, these lines are usually displayed in green.

@@ -6,10 +6,16 @@ import { EventDB, EventObject } from "../types/ranking-board"
 export class EventData {
   context: Context;
   db: EventDB;
+  dataFilePath: string;

   constructor(context: Context) {
     this.context = context;
     this.db = { ranking: [] };
+    this.dataFilePath = process.env.DATA_FILE_PATH || '';
+
+    if (this.dataFilePath == null) {
+      throw new Error('DATA_FILE_PATH is missing in the environment variable.')
+    }
   }

   async load(context?: Context) {
Enter fullscreen mode Exit fullscreen mode

Now, let's figure out what those numbers mean.

The first line of the hunk is a header, @@ -6,10 +6,16 @@ indicating that the hunk is showing 10 lines of the base file, starting from line 6. It also shows 16 lines from the compare file which starts at line 6. About the rest content of the header, we will talk about it later.

There are 10 lines without decoration in the rest of the file, they're from both the base file and the compare file. There are 0 lines decorated with -. 10 + 0 = 10, that's why we got -6, 10. There are a total of 6 lines decorated with the + sign in the front from the compare file. 10 + 6 = 16, that's why we got +6, 16.

The next hunk is a little bit confusing.

@@ -18,11 +24,10 @@ export class EventData {
     }

     const repo = new Repo(context as any);
-    const dataFilePath = 'data/ranking.json';

-    let contentResponse: OctokitResponse<any, number> = await repo.getContent(dataFilePath)
+    let contentResponse: OctokitResponse<any, number> = await repo.getContent(this.dataFilePath)
     let buffer = Buffer.from(contentResponse.data.content, 'base64');
-    let data = buffer.toString('ascii');
+    let data = buffer.toString('utf-8');

     this.db = JSON.parse(data);

Enter fullscreen mode Exit fullscreen mode

If you open this commit in the GitHub you find the first line in this hunk is line 24 not 18. The @@ -18,11 +24,10 @@ here actually means Git takes 11 lines of the code (starting from line 18) from the base file, and compares it with the 10 lines of code (starting from line 24) from the compare file, and here is the result. Remember the 6 additions in the first hunk? 18 + 6 = 24. Hope this picture can help you understand it if you are still confused.

file comparing 1

@@ -45,6 +50,28 @@ export class EventData {
     console.log('type: ', eo.type);
     console.log('will save eo to data.json');
     console.log('>>>>> db is looks like:', this.db);
+
+    let message = `rank: ${eo.receiver} -> ${eo.points} point(s)`;
+
+    this.sync(message, 'main');
+  }
+
+  async sync(message: string, branch: string = 'main', context?: Context) {
+    if (context == null) {
+      context = this.context
+    }
+
+    const content = JSON.stringify(this.db);
+    const repo = new Repo(context as any);
+    const currentCommit = await repo.getCurrentCommit(branch);
+    const fileBlob = await repo.createBlob(content, 'utf-8');
+    const pathsForBlobs = [this.dataFilePath];
+    const newTree = await repo.createNewTree([fileBlob], pathsForBlobs, currentCommit.treeSha);
+    const newCommit = await repo.createCommit(message, newTree.sha, currentCommit.commitSha);
+
+    await repo.updateRef(branch, newCommit.data.sha);
+
+    console.log('database sync done.');
   }

   private add(eo: EventObject) {
Enter fullscreen mode Exit fullscreen mode

The last hunk is straightforward. Comparing the 6 lines from the base file and the 28 lines from the compare file, we got 22 lines of additions.

file comparing 2

Hunk header

Let's break down the hunk header:

@@ -6,10 +6,16 @@ import { EventDB, EventObject } from "../types/ranking-board"

@@ -18,11 +24,10 @@ export class EventData {

@@ -45,6 +50,28 @@ export class EventData {
Enter fullscreen mode Exit fullscreen mode

The hunk header not only provides code range information but also other content. For instance, it displays an import/export code snippet as the hunk's header. However, in this case, it might not show the expected result due to Git's rules for selecting a line of text as the hunk header (Defining a custom hunk-header). Won't talk more about it here because it doesn't play an important role here.

GitHub API

GitHub allows developers to get the additions/deletions easily with GitHub API. Somehow, I didn't figure it out in the first place. But thanks to that, I learned how to read git diffs😀.

References:

Howto: Reading Git Diffs and Staging Hunks

Where does the excerpt in the git diff hunk header come from?

Top comments (0)