DEV Community

Cover image for Git internals
Rafi
Rafi

Posted on • Edited on

Git internals

We developers use git all the time. Git internals might feel like magic, but what git actually does is really simple. Let us peek under the hood to see how git works.

Lets create a empty folder and then run initialize the git repo by running

$ git init
Enter fullscreen mode Exit fullscreen mode

This creates .git folder inside your empty folder. The structure of this .git folder is as follows

.git/
├── branches
├── config
├── description
├── HEAD
├── hooks
├── info
│   └── exclude
├── objects
│   ├── info
│   └── pack
└── refs
    ├── heads
    └── tags

Enter fullscreen mode Exit fullscreen mode

If you open up the HEAD file in your text editor you will see
the following text in it

 ref: refs/heads/master
Enter fullscreen mode Exit fullscreen mode

which means that your current branch is master.

Now add a file and make a first commit by doing

 $ echo "Hello" >> README.md
 $ git add .
 $ git commit -m "Initial commit"
Enter fullscreen mode Exit fullscreen mode

now run $ git log and you will get

commit acde617e8ab39bb157821d3bf84d04e157bff52c (HEAD -> master)
Author: username <test@email.com>
Date:   Wed Aug 05 18:43:48 2020 +0330

    Initial commit
Enter fullscreen mode Exit fullscreen mode

Note:

  • The exact commit hash that you get will differ from what you see here depending on your username, email and time that you make the commit

And if you open up refs/head/master it will have text acde617e8ab39bb157821d3bf84d04e157bff52c inside it.

In git each commit is associated with a hash. The content of the file refs/head/master means master is pointing to the commit
acde617e8ab39bb157821d3bf84d04e157bff52c

 // TODO:

 1. Now make a second commit
 2. check `$ git log` and content of **refs/head/master**
Enter fullscreen mode Exit fullscreen mode

After you have made your first commit if you inspect the contents of the .git folder again you will see something new

.git/
├── branches
├── COMMIT_EDITMSG
├── config
├── description
├── HEAD
├── hooks
├── index
├── info
│   └── exclude
├── logs
│   ├── HEAD
│   └── refs
│       └── heads
│           └── master
├── objects
│   ├── ac
│   │   └── de617e8ab39bb157821d3bf84d04e157bff52c
│   ├── dc
│   │   └── 0a29da1d9b3f68dcd56af0e34f8df4fbf8b24f
│   ├── e9
│   │   └── 65047ad7c57865823c7d992b1d046ea66edf78
│   ├── info
│   └── pack
└── refs
    ├── heads
    │   └── master
    └── tags
Enter fullscreen mode Exit fullscreen mode

There is a new file called index and some weird things inside object folders (we will ignore all other new things for now).
When you run $ git add . git takes the changes that you have made and creates objects for it. The names of the objects are determined by running your file content into SHA1 algorithm. SHA1 algorithm basically takes some input and outputs 40 character string.

Let's try to generate SHA1 of the file README.md. You can do that by running

$ git hash-object README.md
Enter fullscreen mode Exit fullscreen mode

which will give you output

e965047ad7c57865823c7d992b1d046ea66edf78
Enter fullscreen mode Exit fullscreen mode

So that is where the content of the file README.md is stored (inside objects folder). The first two characters of the hash are used for folder name. The file 65047ad7c57865823c7d992b1d046ea66edf78 is binary file to see its
content we can run

$ git cat-file -p e965047ad7c57865823c7d992b1d046ea66edf78
Enter fullscreen mode Exit fullscreen mode

which outputs

Hello
Enter fullscreen mode Exit fullscreen mode

Which is the content of your README.md !!!

But what are other two objects?

There two other objects that are present in the objects directory. What are those?
Git has four types of objects blob, tree, commit and tag. Blob is used to store the content of the file the one we just saw is a blob.
You can see the type of the object by running

$ git cat-file -t e965047ad7c57865823c7d992b1d046ea66edf78
Enter fullscreen mode Exit fullscreen mode

which will print

blob
Enter fullscreen mode Exit fullscreen mode

When you run

$ git cat-file -p acde617e8ab39bb157821d3bf84d04e157bff52c
Enter fullscreen mode Exit fullscreen mode

and you will get

tree dc0a29da1d9b3f68dcd56af0e34f8df4fbf8b24f
author username <test@email.com> 1597845162 +0530
committer username <test@email.com> 1597845162 +0530

Initial commit
Enter fullscreen mode Exit fullscreen mode

That is our actual commit and it has author, committer and something called tree which is another git object

Lets see what that tree object has by running

$ git cat-file -p dc0a29da1d9b3f68dcd56af0e34f8df4fbf8b24f
Enter fullscreen mode Exit fullscreen mode

It outputs

100644 blob e965047ad7c57865823c7d992b1d046ea66edf78    README.md
Enter fullscreen mode Exit fullscreen mode

It has the name of the file and name of the blob that has the file content. It is essentially how your working directory looked like at that commit.

Let's sum up our understanding till now. When you make a commit in git. The content of the file are passed through an SHA1 hash to get a 40 character length string, file is created with that name storing the contents. Then it creates a tree object which is essentially how your working directory looked at that point in time. The tree says which blobs are associated with which file names. Then there is a commit object which points to this tree object and also has the commit message, author, committer, and email.

 // TODO

 1. Now make another commit
 2. inspect the contents of the .git folder
 3. See what are the objects that are there in .git folder
 4. Look at content of objects

Enter fullscreen mode Exit fullscreen mode

Branching

Now lets create a branch by running

$ git branch feature-1
Enter fullscreen mode Exit fullscreen mode

Now lets take a look at content of .git folder

.git/
├── branches
├── COMMIT_EDITMSG
├── config
├── description
├── HEAD
├── hooks
├── index
├── info
│   └── exclude
├── logs
│   ├── HEAD
│   └── refs
│       └── heads
│           ├── feature-1
│           └── master
├── objects
│   ├── ac
│   │   └── de617e8ab39bb157821d3bf84d04e157bff52c
│   ├── dc
│   │   └── 0a29da1d9b3f68dcd56af0e34f8df4fbf8b24f
│   ├── e9
│   │   └── 65047ad7c57865823c7d992b1d046ea66edf78
│   ├── info
│   └── pack
└── refs
    ├── heads
    │   ├── feature-1
    │   └── master
    └── tags
Enter fullscreen mode Exit fullscreen mode

Now there is a new file called refs/heads/feature-1 and if we take a peak at its content it will be the commit hash from which you created the branch.

Now if we checkout feature-1 branch by running

$ git checkout feature-1
Enter fullscreen mode Exit fullscreen mode

The content of our HEAD file changes to

ref: refs/heads/feature-1
Enter fullscreen mode Exit fullscreen mode
 // TODO

 1. Try creating a file refs/heads/feature-2
 2. Run git log
 3. Put the hash inside that file
 4. Try running git branch

Enter fullscreen mode Exit fullscreen mode

Staging area

When you create a file you are creating the file in your local file system and after you are done you add the file to git by running $ git add . this adds the file to the staging area. Then when you make commit the files the commit object is created for files in the staging area.

So the question now is where is this staging area? The answer is it is in the index file

We can see the contents of the staging area by running

$ git ls-files --stage
Enter fullscreen mode Exit fullscreen mode

which gives us

100644 e965047ad7c57865823c7d992b1d046ea66edf78 0 README.md
Enter fullscreen mode Exit fullscreen mode
 // TODO

 1. Create a README2.md file
 2. Run git ls-files --stage and look at its content
 3. Run git add .
 4. Now run git ls-files --stage again

Enter fullscreen mode Exit fullscreen mode

Note:

  1. We have skipped some details like tags, packing...
  2. The number 100644 is essentially permissions of the file

Thanks to Yancy Min for sharing their work on Unsplash

Latest comments (2)

Collapse
 
thebuildguy profile image
Tulsi Prasad

Very insightful content!

So basically all that we edit and write in our files are stored inside a mere 40 char string encoded with SHA1? Or are there any limitations to this?

Thank you!

Collapse
 
rafi993 profile image
Rafi

Thank you @heytulsiprasad there are some limitations of SHA1. Git is working on migrating to SHA256 for better hash security lwn.net/Articles/811068/ but it will take some time.