Abhigyan Gautam

Posted on May 29, 2023

What actually happens when you git?

#git #versioncontrol

Git is by far one of the most useful tools if you are a software developer. Git is a version control system used by developers to keep track of changes made to a project's codebase. It allows multiple people to collaborate on the same codebase without overwriting each other's work.

But what’s even interesting is how Git manages to maintain the different versions of the code, down to an individual line. In this article we will take a look into how git actually handles the version control system

The .git folder

Whenever you initialise a project with git init, a hidden folder called .git . This is where git stores information regarding the project, git configurations and information about the versions. Let’s initialise a folder with git and see the contents.

git init

These are the contents of the .git folder.

config contains details about the repository configurations like username and password for the project etc. This file is overwritten with new properties.
description contains the description of the repository
HEAD This file maintains the reference to the current branch. At the moment it must be the master branch.
refs stores references to all the branches.
objects stores the data of the Git objects which contains contents of all the files checked in, commits etc..
hooks contains shell script commands that are executed post git commands.
info contains information about the repository.

Git’s SHA1 hashing

Git stores data in the form of blob objects. Every blob in git is hashed with SHA 1 which are 20 bytes, represented by 40 hexadecimal characters. We can generate hashes for any content. For example, the SHA 1 hash for “Superman” is 5f42cf3e4992beffcd80266227d529427adb7a2d. There is one and only one hash for the content “Superman”. You can check this for yourself using

❯ echo "Superman" | git hash-object  --stdin
5f42cf3e4992beffcd80266227d529427adb7a2d

If we change the content, we get a completely new hash

❯ echo "SuperMan" | git hash-object  --stdin
3b552a73712ce7111a4aa6a600f19700ae378f7a

This is the basis of what git does. It tracks the changes by generating a different hash each tome a change comes in.

A new commit

Now lets add a file called testfile to this repo and commit it

❯ touch testfile
❯ git add testfile
❯ git commit -m "added test file"

Now notice the commit number: c36ed1c. This is the blob object created for this commit. Git maintains the versions using something similar to a file system. It stores the content of the object, the commits in the form of a blob object. The difference between blob and file is that blob stores only the content, while file can store the metadata as well.

c36ed1c is just the first seven letters of the actual hashed name. If we now go to the git folder we see the following:

Here you can see the full hashed name under folder c3 . The string starting with c3 and ending with 2b is the full hash of the commit blob. But what about the other hashes? We did not create them. Well, we did create them… sort of. Let me explain.

Type the following command in the root of the project.

❯ git cat-file c36ed1cabb3565974f8846391f6ed59959f2d02b -p

This command shows the content of the hash, in this case it is the commit object.

As you can see, the content of this blob is added test file which was our content for the commit message. It contains a reference to other hash c6dc3ef... which contains the contents about the tree.

Again if we use git cat-file in this hash, we get the following

This time it contains a reference to the testfile and is stored as hash e69de29... . Thus now we have the filename, the content of the file and the tree all stored in git in their hashed format.

Making changes

Now lets add some content in testfile. Adding a simple text , “this is a test file” in the file. Now when we do git status we see that this file has been modified because the respective generated hashes do not match with what is present in the .git folder. This can be tracked down to individual line as git maintains a hash of the content at the lowest level (as seen above).

Lets commit this change.

❯ git add testfile
❯ git commit -m "second commit"

This time three new subdirectories are added under objects.

Again using git cat-file, we can see the contents of these new hashes.

This time, we have a new property for this hash - parent . Since git commits work in the form of trees and each commit is a node, the previous commit becomes the parent of this commit. Therefore c36ed... is the parent of the commit f51e8... . Thus it becomes easier to track commits and their history is represented in the form of the relation between nodes.

Conclusion

Now we know how git maintains a version report of every single piece of content in the project. Git can then use these generated hashes and the tree to pinpoint exactly the content that is required down to the smallest level.

Thanks for reading! 👋

Cover photo by Photo by Praveen Thirumurugan

Top comments (2)

Thomas Broyer • Jun 1 '23

What's important I think is to understand Git stores "snapshots" of your working tree, and then computes the diffs when you need to think in terms of diffs (show me what this commit did, rebase that branch, cherry pick that commit, etc.)

Also, this storage as files named by the hashes of their content is no longer entirely accurate, as Git will actually optimize storage by "packing" things into archives (see git-scm.com/docs/git-gc which points to other plumbing commands), so you won't necessarily find the files in .git/objects (but can still access them with git cat-file)

9opsec • May 30 '23

Good info to know. I'm adding this to my notes about git for future reference.

DEV Community

What actually happens when you git?

The .git folder

Git’s SHA1 hashing

A new commit

Making changes

Conclusion

Top comments (2)

Read next

Automating Docker Image Versioning, Build, Push, and Scanning Using GitHub Actions

Git: A Guide to Mastering Version Control

Enforcing Git Branch Naming Standards: Tools and Tips for Developers

🔥10 Git Features You Might Not Know About