loading...

Git diff explained

jennieji profile image Jennie Updated on ・4 min read

I use git diff almost every working day to verify code changes, review teammate's code, or trace histories and find out what happened. However, I only learnt to read the raw git diff last year, when I tried to extract what function, variable or class are changed here.

Other than popping up the raw git diff automatically when committing, merging in the command line tool, you may show the raw git diff via git show <revision> or git diff <from_revision> <to_revision>.

And the command line tool will print text like following:

commit bd65fa0b7f6cfc8b77107f935c4583771dec2a01
Author: Jennie Ji <jennie.ji@hotmail.com>
Date:   Wed Jan 29 22:22:25 2020 +0800

    Update package info, config prettier, format code

diff --git a/packages/es-stats/src/getDeclarationNames.ts b/packages/es-stats/src/getDeclarationNames.ts
index 70161a6..2d9f1eb 100644
--- a/packages/es-stats/src/getDeclarationNames.ts
+++ b/packages/es-stats/src/getDeclarationNames.ts
@@ -8,14 +8,19 @@ import { MemberRef } from 'ast-lab-types';
  * @param node AST node object
  * @return A list of objects contain declaration name and alias
  */
-export default function getDeclarationNames(node: Node): Array<MemberRef> | null {
-  switch(node.type) {
+export default function getDeclarationNames(
+  node: Node
+): Array<MemberRef> | null {
+  switch (node.type) {
     case 'VariableDeclaration':
       return node.declarations.reduce((ret, node) => {
         if (node.id) {
           return ret.concat(getPatternNames(node.id));
         } else {
-          console.warn('getDeclarationNames - VariableDeclaration id not exist, node:', node);
+          console.warn(
+            'getDeclarationNames - VariableDeclaration id not exist, node:',
+            node
+          );
           return ret;
         }
       }, [] as Array<MemberRef>);
@@ -27,4 +32,4 @@ export default function getDeclarationNames(node: Node): Array<MemberRef> | null
       return null;
   }
   return null;
-}
\ No newline at end of file
+}

It starts with the commit SHA, author info, commit time and the commit message, and followed by multiple "diff" blocks.

Each "diff" block contains the changes of one file, and starts with line:

diff --git a/relative_file_path b/relative_file_path

The a and b above represents change before and after, followed by a path relative to the repo root. The a and b paths may only be different if git find 2 changed files are similar.

Git judge the similarity of 2 files via a percentage number called similarity index, which represents the unchanged lines percentage of the file.

If similarity index is larger than 50%, git think you renamed the file, and did some small changes. And the changs of both files would be combined in one diff block. 50% could be configured according to git document here.

Next you might see this line:

index 0000000..fd9414a 100644

I didn't find the meaning of the number in the end. The 0000000..fd9414a in the middle was a short SHA-1 type git revision.

It seems not so meaningful... But sometimes, instead of this line, you may see this instead:

similarity index 86%
rename from docs/enums/git_changes_affected.git_operation.html
rename to docs/enums/git_changes_affected.git_operation-1.html

Or this:

deleted file mode 100644

Or this:

new file mode 100644

These could tell us what operation author did to the file - rename, delete, create new, or just change content.

And then there are two lines starting with - or +:

--- a/relative_file_path
+++ b/relative_file_path

Same as a and b, - and + represents change before and after. And the path will only be different if user "renamed" the file, same as diff --git line.

🤔Not sure why git duplicates the information here, it only confuses people. Especially when I found both "new" or "deleted" mode will have same file path in a and b, makes it could not tell whether the file exists in both context.

After these, we could find code changes in the file, split into multiple blocks if there are many inconnected changes.

Each change block starts with this unique pattern:

@@ -8,14 +8,19 @@

And sometimes you may see a line of code following it:

@@ -8,14 +8,19 @@ export function example() {
  console.log('I am the starting line 8, not line above');

It tells us in both - and + this change block starts from line 8. Under - context, this block contains 14 lines of code. Under + context, this block contains 19 lines.

Starting line 8 is the line number of the code under this pattern, not the line of code starts with - or + symbol. Which means in the following example:

@@ -8,14 +8,19 @@ import { MemberRef } from 'ast-lab-types';
  * @param node AST node object
  * @return A list of objects contain declaration name and alias
  */
-export default function getDeclarationNames(node: Node): Array<MemberRef> | null {
-  switch(node.type) {
+export default function getDeclarationNames(

The first - and + line is line 11.

Most of the information is quite straight forward, but enable to use those information in code, I have to check line by line with help of regex and convert it to JSON format like this. I used the path, operation and line number, and the location info from AST to figure out the function, variable or class changed. Hope this helps :).

Posted on by:

Discussion

pic
Editor guide