kataoka_nopeNoshishi

Posted on Feb 12, 2023

Make original Git by Rust! (Analyze section)

#git #beginners #rust #python

Hello Dev community!

I'm noshishi, a apprentice engineer in Tokyo.
This article is about understanding Git from the inside by creating a simple program that add and commit.

But it's a very long story, so I'll post the development section separately!

Foreword

The starting point is 'If I could understand git, I could make it?!!'

I took this opportunity to try out a new programming language, so I decided to try Rust this time. The repository I actually created is My original git nss. The quality of the code isn't quite there yet and many parts are still incomplete, but you can do a straight line of local development!

If you give me a star, I'll be happy to fly, and of course I'll be waiting for your contributions! Feel free to touch this repository any way you like!

Please forgive us for not being able to explain some of the details in this article alone. Also, we use Rust for development, but Python for the stage where we uncover Git's internals!

Git Inside

First, we will unpack how Git handles data, based on the official documentation.
The Git command system is very complex.
But, Git data structure is very simple!

Where is repository

A repository is the directory under the control of Git, and the folder .git in the directory created by init or clone is the actual state of the repository.

Let's put an empty folder called project under Git's control.

$ pwd
/home/noshishi/project
$ ls -a
# nothing yet
$ git init
Initialized empty Git repository in /home/noshishi/project/.git/
$ ls -a
.git

This .git directory consists of the following.

.git
├── HEAD
├── (index)  // Not created by `init`!
├── config
*
├── objects/
└── refs/
    ├── heads/
    └── tags/

(info)

The path types of Git repositories are difficult to understand at first glance. We have added / to the directory path so that you can refer to it. Also, we have omitted parts that are not explained in this article.

Object

Git manage versions by file data called objects.

Objects are stored in .git/objects.

Types

Objects has four types, blob、tree、commit、tag.

The contents of each and the corresponding data will be as follows.

blob ... File data
tree ... Directory data
commit ... Metadata to manage the tree of the repository
tag ... Metadata for a specific commit * Not explained at this article.

Image with first.txt in the project repository

Structure

The Object is FILE DATA, so it has a file name (path) and the data stored in it, just like a normal file.

File name (path)
The file name (path) is 40-character string. This is a hash (sha-12) of object data.

Actually, the first two are the directory path and the remaining 38 are the file path.

Data
Object data is compressed by zlib1. The decompressed data consists of two parts: header and content. The two elements are then separated by \0 (null byte).

header is a combination of the object type and the size of content.

content contains the corresponding data in an easy-to-handle format, as indicated by the type. (Later we will see the details).

How to Create blob Object

Index (staging area)

The actual index used when you add is a file .git/index.

Structure

The index stores data of files marked by add with meta information. The stored data contains the latest file data at the time of add.

It is important to note that all data recorded in the index is in file data units.
I will describe meta information in detail later, but the storage format is exactly defined as shown in index-format.

Hmm.... feel sleepy....

Wait!

Let's actually analyze the object and the index!

Analyze Object

Before starting the analysis work, create all of the blob, tree, and commit.
Just add the files in project and commit.

Createing the following two files...

first.txt

Hello World!
This is first.txt.

second.py

def second():
    print("This is second.py")

next, add and commit.

git add -A
git commit -m 'initial'

Then the contents of .git/objects are now as follows.

.git/
└── objects/
    ├── 48/
    |   └── c972ae2bb5652ada48573daf6d27c74db5a13f
    ├── af/
    |   └── 22102d62f1c8e6df5217b4cba99907580b51af
    ├── da/
    |   └── f3f26f3fa03da346999c3e02d5268cb9abc5c5
    └── f7/
        └── f18b17881d80bb87f281c2881f9a4663cfcf84

**From now on, hash values in the text will omit the number of characters. 3*

The corresponding data and hash values for each are summarized below.

hash value	Object	correspond data
`f7f18b1`	`blob`	first.txt
`af22102`	`blob`	second.py
`daf3f26`	`tree`	project direcrtory
`48c972a`	`commit`	commit version 1

*The analysis work will be conducted interactively using Python, an interpreted language.

blob

blob is an object corresponding to file data.
The image looks like this.

Data

First, let's look at f7f18b1, which corresponds to first.txt.

...Oops, I failed.

% python
>>> with open('.git/objects/f7/f18b17881d80bb87f281c2881f9a4663cfcf84', 'r') as f:
...     contnet = f.read()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xca in position 3: invalid continuation byte

Since the content is compressed, attempting to read the content as-is as a string 4 will fail.
Therefore, we read the content as binary.

>>> with open('.git/objects/f7/f18b17881d80bb87f281c2881f9a4663cfcf84', 'rb') as f: # read binary!
...     contnet = f.read()
>>> content
b'x\x01K\xca\xc9OR06d\xf0H\xcd\xc9\xc9W\x08\xcf/\xcaIQ\xe4\n\xc9\xc8,V\x00\xa2\xb4\xcc\xa2\xe2\x12\xbd\x92\x8a\x12=\x00\xfa-\r\x03'

Then I read successfully and the byte string.

Now, decompress the content with zlib, as described in the official documentation.

>>> import zlib
>>> decompressed = zlib.decompress(content)
>>> decompressed
b'blob 31\x00Hello World!\nThis is first.txt.'
>>> decompressed.split(b'\0')
[b'blob 31', b'Hello World!\nThis is first.txt.']

We found that a blob consists of the following elements

header ... blob 31
Null byte ... \x00　※hex notation
content ... Hello World!\nThis is first.txt.

File name

We should check whether the hash value of the object is indeed correct.

The file name of the object should be the value obtained by hashing decompressed with the hash function sha1, so check it.

>>> import hashlib
>>> blob = b'blob 31\x00Hello World!\nThis is first.txt.'
>>> sha1 = hashlib.sha1(blob).hexdigest()
>>> sha1
'f7f18b17881d80bb87f281c2881f9a4663cfcf84'

Great, exact match!

How about another file

Let's also look at af22102, which corresponds to the other second.py.

>>> with open('.git/objects/af/22102d62f1c8e6df5217b4cba99907580b51af', 'rb') as f:
...     contnet = f.read()
>>> decompressed = zlib.decompress(content)
>>> decompressed
b'blob 44\x00def second():\n    print("This is second.py")'

>>> blob = b'blob 44\x00def second():\n    print("This is second.py")'
>>> sha1 = hashlib.sha1(test).hexdigest()
>>> sha1
'af22102d62f1c8e6df5217b4cba99907580b51af'

It can be summarized as follows

header ... blob 44
Null byte ... \x00
content ... def second():\n print("This is second.py")

And the sha1 values (hash values) derived from the data also matched.

Supplemental
The blob itself does not hold the filename of the corresponding file data.

Instead of blob, the object that manages its name is tree.

Tree

tree is an object corresponding to directory data.
The image looks like this.

We will analyze it in the same way as for blob.

>>> with open('.git/objects/da/f3f26f3fa03da346999c3e02d5268cb9abc5c5', 'rb') as f:
...     content = f.read()
>>> decompressed = zlib.decompress(content)
>>> decompressed
b'tree 74\x00100644 first.txt\x00\xf7\xf1\x8b\x17\x88\x1d\x80\xbb\x87\xf2\x81\xc2\x88\x1f\x9aFc\xcf\xcf\x84100644 second.py\x00\xaf"\x10-b\xf1\xc8\xe6\xdfR\x17\xb4\xcb\xa9\x99\x07X\x0bQ\xaf'
>>> decompressed.split(b'\0')
[b'tree 74',
 b'100644 first.txt',
 b'\xf7\xf1\x8b\x17\x88\x1d\x80\xbb\x87\xf2\x81\xc2\x88\x1f\x9aFc\xcf\xcf\x84100644 second.py',
 b'\xaf"\x10-b\xf1\xc8\xe6\xdfR\x17\xb4\xcb\xa9\x99\x07X\x0bQ\xaf']

The tree has multiple contents, so we seem a bit complicated.

The tree contnet is composed of repeating mode5, path and hash, which are meta information about the data in the directory,

If you simply separate them with \0, the hash value of the previous data and the meta information of the next file data are attached to each other.

This is because the meta information and the hash value are separated by \0.

First, we will check the data stored in the first one.
Looking at the split, like first.txt is stored, right?

>>> temp = decompressed.split(b'\0')
>>> temp[1]
b'100644 first.txt'
>>> temp[2]
b'\xf7\xf1\x8b\x17\x88\x1d\x80\xbb\x87\xf2\x81\xc2\x88\x1f\x9aFc\xcf\xcf\x84100644 second.py'

In order to split temp[2] well, let's take it out by 20 bytes.
Array access of byte strings can be byte.

>>> temp[2][0:20]
b'\xf7\xf1\x8b\x17\x88\x1d\x80\xbb\x87\xf2\x81\xc2\x88\x1f\x9aFc\xcf\xcf\x84'
>>> temp[2][0:20].hex()
'f7f18b17881d80bb87f281c2881f9a4663cfcf84'
>>> temp[2][20:]
b'100644 second.py'

Repeating the same process revealed the following.

header ... tree 74
Null byte ... \x00
content1 ... 100644 first.txt\x00f7f18b1...
content2 ... 100644 second.py\x00af22102...

The management of tree hashes is described in (Digression) deciphering Tree bytes!

Supplemental
A tree may contain not only a blob but also a tree.
That is, if there is a directory within a directory.
This is because tree, like blob, does not keep the directory name of itself and the corresponding data.

Commit

commit contains the tree of the repository directory with meta information.
The image looks like this.

Let's analyze!

>>> with open('.git/objects/48/c972ae2bb5652ada48573daf6d27c74db5a13f', 'rb') as f:
...     content = f.read()
>>> decompressed = zlib.decompress(content)
>>> decompressed
b'commit 188\x00tree daf3f26f3fa03da346999c3e02d5268cb9abc5c5\nauthor nopeNoshishi <nope@noshishi.jp> 1674995860 +0900\ncommitter nopeNoshishi <nope@noshishi.jp> 1674995860 +0900\n\ninitial\n'
>>> decompressed.split(b'\0')
[b'commit 188',
 b'tree daf3f26f3fa03da346999c3e02d5268cb9abc5c5\nauthor nopeNoshishi <nope@noshishi.jp> 1674995860 +0900\ncommitter nopeNoshishi <nope@noshishi.jp> 1674995860 +0900\n\ninitial\n']

# a little bit more
>>> header, content = decompressed.split(b'\0')
>>> header
b'commit 188'
>>> content
b'tree daf3f26f3fa03da346999c3e02d5268cb9abc5c5\nauthor nopeNoshishi <nope@noshishi.jp> 1674995860 +0900\ncommitter nopeNoshishi <nope@noshishi.jp> 1674995860 +0900\n\ninitial\n'
>>> content.split(b'\n')
[b'tree daf3f26f3fa03da346999c3e02d5268cb9abc5c5',
 b'author nopeNoshishi <nope@noshishi.jp> 1674995860 +0900', 
 b'committer nopeNoshishi <nope@noshishi.jp> 1674995860 +0900', 
 b'', 
 b'initial',
 b'']

The stored data are as follows.

header ... commit 188
Null byte ... \x00
tree ... tree daf3f26f3fa03da346999c3e02d5268cb9abc5c5
author ... author nopeNoshishi <nope@noshishi.jp> 167...
committer ... committer nopeNoshishi <nope@noshishi.jp> 167...
message ... initial

You can see that it contains the tree hash value that you saw in the tree chapter earlier, information about the repository owner and the person who made the commit, and the message.

I will go ahead with the commit and analyze it again.
Edit first.txt as follows and add and commit again.

first.txt(version2)

Hello World!
This is first.txt.
Version2

git add first.txt
git commit -m 'second'

Then the contents of .git/objects are now as follows.

.git/
└── objects/
    ├── 3f/
    |   └── f934272  # new tree .. project repo version 2
    ├── 37/
    |   └── 349c9b0  # new commit .. "second"
    ├── 48/
    |   └── c972ae2  # old commit .. "initial"
    ├── af/
    |   └── 22102d6  # old blob .. second.py version 1
    ├── c8/
    |   └── 843b4db  # new blob .. first.txt version 2
    ├── da/
    |   └── f3f26f3  # old tree .. project repo version 1
    └── f7/
        └── f18b178  # new blob .. first.txt version 1

See the new commit...

>>> with open('.git/objects/37/349c9b05c73281008e7b6b7453b595bb034a52', 'rb') as f:
...     content = f.read()
... 
>>> decompressed = zlib.decompress(content)
>>> decompressed
b'commit 235\x00tree 3ff9342727caf81397740327aa406c1cc6d4408e\nparent 48c972ae2bb5652ada48573daf6d27c74db5a13f\nauthor nopeNoshishi <nope@noshishi.jp> 1675174139 +0900\ncommitter nopeNoshishi <nope@noshishi.jp> 1675174139 +0900\n\nsecond\n'

The stored data are as follows.

header ... commit 188
Null byte ... \x00
tree ... tree daf3f26f3fa03da346999c3e02d5268cb9abc5c5
parent ... parent 48c972ae2bb5652ada48573daf6d27c74db5a13f
author ... author nopeNoshishi <nope@noshishi.jp> 167...
committer ... committer nopeNoshishi <nope@noshishi.jp> 167...
message ... second

The new commit stored the hash value of the previous version of commit.

Supplemental

The difference against blob or tree is that commit does not store the actual data in the repository. But it has meta data starting from tree.

Key-Value Store

Some of you may have an idea of what I'm talking about.

If you unravel a commit, you can get a tree, and if you unravel a tree, you can get a blob.

The version flow shows the history because commit knows the hash value of the previous commit.
This image shows the history of the current commit.

So Git manages file versions from the starting point, which is the hash value of the object.

(Info)
Officially, Git is called Address (hash) File System.
The hash function itself is an invertible transformation, so the original data cannot be restored from the hash value, but as long as the hash value depends on the contents of the object to begin with, it may be called a value-value store.

Summary

In a world without version control systems like Git, what do you do when you want to keep your current files and work on something new with the same files?
Perhaps one way you might think of doing this is to copy the file and put it in another folder.
In fact, this seemingly weird management method is the closest form of version control that supports Git.

(Info)
Git is a storage system that makes clever use of the OS file system.

Analize Index

The index (staging area) is veiled, but like the object, the design is very simple.
(On the other hand, it is a bit quirky to analyze. The dismantling of the index sucked up dozens of hours...

I'm going to analyze .git/index, which has been committed for the second time.

Specification

In order to analyze, we need to understand the design specification of index.

Referring to Index format in the official document, we found the following specifications.

Index Format
Header
    - 4 bytes   Index header                * DIRC
    - 4 bytes   Index version   　　　　     * basic version 2
    - 32 bits   number of entries in index  * Entries are the meta information for each file.

エントリー
    - 32 bits   create file time
    - 32 bits   create file time at nano
    - 32 bits   modify file time
    - 32 bits   modify file time at nano
    - 32 bits   device id
    - 32 bits   inode
    - 32 bits   Permission (mode)
    - 32 bits   user id
    - 32 bits   group id
    - 32 bits   file size
    - 160 bits  `blob` hash value
    - 16 bits   filename size               * Number of bytes in filename string
    - ?  bytes  filename                    * Variable depending on file name
    - 1-8 bytes padding                     * Variable depending on entry

... The same thing continues by number of entries ....

Index

Now that we have the specifications, we will read them again in python.

The index is uncompressed, but reads in binary format as well as the object because all meta information is stored in bytes.

>>> with open('.git/index', 'rb') as f:
...     index = f.read()
>>> index
b'DIRC\x00\x00\x00\x02\x00\x00\x00\x02c\xd9 \xf4\x05\xeb\x80\xb2c\xd9 \xf4\x05\xeb\x80\xb2\x01\x00\x00\x06\x00\xb8\'\x07\x00\x00\x81\xa4\x00\x00\x01\xf5\x00\x00\x00\x14\x00\x00\x00(\xc8\x84;M\xb8\x06\xe5\xd6Z\x12\xefV\xbfK\xeeQ\xe7\x15\'\x93\x00\tfirst.txt\x00c\xd6hv\x17\xa5\x05nc\xd6hv\x17\xa5\x05n\x01\x00\x00\x06\x00\xb8\'\x14\x00\x00\x81\xa4\x00\x00\x01\xf5\x00\x00\x00\x14\x00\x00\x00,\xaf"\x10-b\xf1\xc8\xe6\xdfR\x17\xb4\xcb\xa9\x99\x07X\x0bQ\xaf\x00\tsecond.py\x00TREE\x00\x00\x00\x19\x002 0\n?\xf94\'\'\xca\xf8\x13\x97t\x03\'\xaa@l\x1c\xc6\xd4@\x8e\xf2\xe4\xd7:\x95\xc1?\x18\xd3\xe9\x7f\x8fp\x9c$N\xc9dX\xa4'

It looks readable in places.
You can see the original DIRC, first.txt and second.py!

Since 32bits is 4bytes, it can be easily pulled out.

>>> index[0:4]
b'DIRC' # Index header -> DIRC
>>> index[4:8]
b'\x00\x00\x00\x02' # Index version => 2
>>> index[8:12]
b'\x00\x00\x00\x02' # number of entries => 2

The index manages metadata per file, so you will have two entries, first.txt and second.py.

For the purpose of this article, I will just take a quick look at the meta information from the next creation time to the group ID, which is not very important except for the mode.

>>> index[12:16]
b'c\xd9 \xf4' # ctime
>>> index[16:20]
b'\x05\xeb\x80\xb2' # ctime nano
>>> index[21:24]
b'\xd9 \xf4' # mtime
>>> index[24:28]
b'\x05\xeb\x80\xb2'  # mtime nano
>>> index[28:32]
b'\x01\x00\x00\x06' # dev id
>>> index[32:36]
b"\x00\xb8'\x07" # inode
>>> index[36:40]
b'\x00\x00\x81\xa4' # mode
>>> index[41:44]
b'\x00\x01\xf5' # user id
>>> index[44:48]
b'\x00\x00\x00\x14' # gorup id

Here are the key points to look at.
First is the file size.

# file size
>>> index[48:52]
b'\x00\x00\x00('
>>> index[48:52][0]
0
>>> index[48:52][1]
0
>>> index[48:52][2]
0
>>> index[48:52][3]
40

The file size of the next file to come is found to be 40bytes.

Next is the hash value.

# hash
>>> index[52:72]
b"\xc8\x84;M\xb8\x06\xe5\xd6Z\x12\xefV\xbfK\xeeQ\xe7\x15'\x93"
>>> index[52:72].hex()
'c8843b4db806e5d65a12ef56bf4bee51e7152793'

We see the hash value matches the one in version 2 first.txt!

And the size of the filename.

# filename size
>>> index[72:74]
b'\x00\t'
>>> index[72:74][0]
0
>>> index[72:74][1]
9

This size (in bytes) is very important, without it, you will have to search for the next file name by your feeling.

Now that we know the filename is 9 bytes, we can...

>>> index[74:83]
b'first.txt'

We can extract the file name without missing anything.

Finally, padding depends on the number of bytes used to represent the entry.
The calculation method is to find X bytes such that the bytes up to the padding plus the X bytes to be padded is a multiple of 8.

Expressed as a formula, X (padding), y (filename size), a (remainder)

In this case, from creation time to file size, 62 bytes, and the file name is 9 bytes.

We found the bytes of padding was 1 byte.

>>> index[83:84]
b'\x00'
>>> index[83:85]
b'\x00c' # There's one that isn't a null bite, and it's from the second bite!
>>> index[83:86]
b'\x00c\xd6'

The bytes of padding up to the next entry creation time was correctly matched.

Summary

Actually, when you add, tree is not created.
You commit, then tree will be generated from index.

index has important role to link added file data to blobs and manage which versions of files are committed.

You may have heared git dealed a snapshot, not difference.
In other words, when indexes have not been updated, file data will always remain unless explicitly excluded.
And that means that everything you commit can be restored through the index.

(Info)
index is an important entity that holds the key to whether or not a file is subject to version control in Git.

Background of Command

Now that we know how Git handles data, let's take a quick look at how the commands behave.

The command has many options, so more complex behavior can be achieved, but I only describe a basic role.

add

add is responsible for adding, deleting, and updating the target file data to the index.
When added, git creates a blob of the instantaneous(latest) file data.

The plumbing commands that make this happen are hash-object and update-index.
※In Plumbing commands chapter, I describe the detail.

commit

Git create a tree corresponding to the repository directory based on the index created, and then create a commit.
After the commit is successfully created, change the hash value of the commit that the HEAD and branch point to.

The plumbing commands that accomplish this are write-tree, commit-tree, and update-ref.

Digression

Deciphering Tree

We'll look into the byte in a bit.

What is the maximum value of a number that can be represented by a single (unsigned) byte?
2^8 - 1 = 255. This corresponds to the maximum number of hexadecimal digits that can be represented by two hexadecimal digits.

>>> temp[2][0]
247　 # = `\xf7`

I used the hex() function quickly above, but if you look at it one byte at a time...

>>> hash = ''
>>> for hex in temp[2][0:20]:
...     hash += format(hex, 'x')
>>> hash
'f7f18b17881d80bb87f281c2881f9a4663cfcf84'

I can get the hash value of the blob corresponding to first.txt as a string!

hash are 40 characters, but each character is a value calculated in hexadecimal. So the trick is that one byte can represent two characters .

commit stores the hash value as a string, but for some reason the tree stores the hash value directly as bytes, not as a string.

There was some discussion on stackoverflow as to why.

https://stackoverflow.com/questions/42009133/how-to-inflate-a-git-tree-object

HEAD and Branch

The Branch is responsible for marking specific commit objects.
It is stored under .git/refs/heads/.
You can easily see the contents with the Linux command cat.

Since we were working on the master branch earlier, we can look at .git/refs/heads/master and see ...

% cat .git/refs/heads/master
37349c9b05c73281008e7b6b7453b595bb034a52

The hash value of the last committed commit object was stored.

The HEAD indicates which commit object you are basing your file edits on.
HEAD can point directly to a commit object, but it basically goes through branch.
.git/HEAD is what it is.

The data is stored as follows.

% cat .git/HEAD
ref: refs/heads/master

It contained the path about where the master branch is stored.

If you want to point directly to a commit (detached head), use checkout to move HEAD.

% git checkout 37349c9b05c73281008e7b6b7453b595bb034a52
% cat .git/HEAD
ref: 37349c9b05c73281008e7b6b7453b595bb034a52

Plumbing commands

To further manipulate Git at a low level, there is a command for every single action.
(These are god-like commands created by Mr. Linus for ordinary people like me.)

`cat-file`

This command allows you to see the contents of an object.
We worked hard earlier to analyze the object, but this single command is the solution.

# See object type
% git cat-file -t af22102d62f1c8e6df5217b4cba99907580b51af # second.py
blob

# Output object content
% git cat-file -p af22102d62f1c8e6df5217b4cba99907580b51af # second.py
def second():
    print("This is second.py")

`hash-object`

You can hash file data, etc. or store them in .git/objects.

Let's create third.rs.

struct Third {
    message: String   
}

# calculate hash value
% git hash-object
4aa58eed341d5134f73f2e9378b4895e216a5cd5

# Create blob object
% git hash-object -w
4aa58eed341d5134f73f2e9378b4895e216a5cd5
% ls .git/objects/4a
a58eed341d5134f73f2e9378b4895e216a5cd5

`update-index`

This command adds the target file to the index.
Note, however, that no object is created.

`ls-files`

This command provides a concise view of the contents of the index.

# see the latest index
% git ls-files
first.txt
second.py

# add index third.rs cache
% git update-index --add third.rs 
% git ls-files
first.txt
second.py
third.rs
% git ls-files -s
100644 c8843b4db806e5d65a12ef56bf4bee51e7152793 0       first.txt
100644 af22102d62f1c8e6df5217b4cba99907580b51af 0       second.py
100644 4aa58eed341d5134f73f2e9378b4895e216a5cd5 0       third.rs

`write-tree`

We create a tree based on the contents of the index.
All directories, not just repository directory.

% git write-tree
109e41a859caa3e3b87e8f59744b0b1845efe275
% ls .git/objects/10 
9e41a859caa3e3b87e8f59744b0b1845efe275

`commit-tree`

We create the commit with the hash of the (repository directory) tree.

# Enter the hash value of the parent `commit` and the 
# hash value of the `tree` you just created
% git commit-tree -p 37349c9b05c73281008e7b6b7453b595bb034a52 -m 'third commit' 109e41a859caa3e3b87e8f59744b0b1845efe275
ddb3c0d94d860ff657e2cdb82f5513f7db2924f1
% ls .git/objects/dd 
b3c0d94d860ff657e2cdb82f5513f7db2924f1　#　object is created

`update-ref`

We can't just commit-tree and follow the history, because no one will see the commits you made.
This is because no one can see the commits we have made.

# Because the git log follows the history sequentially 
# from the commit pointed to by HEAD, the commit you
# just created is not yet referenced.
% git log
commit 37349c9b05c73281008e7b6b7453b595bb034a52 (HEAD -> master)
Author: nopeNoshishi <nope@noshishi.jp>
Date:   Tue Jan 31 23:08:59 2023 +0900

    second

commit 48c972ae2bb5652ada48573daf6d27c74db5a13f
Author: nopeNoshishi <nope@noshishi.jp>
Date:   Sun Jan 29 21:37:40 2023 +0900

    initial

# Change the branch's references.
% git update-ref refs/heads/master ddb3c0d 37349c9 # new-hash old-hash
% git log
commit ddb3c0d94d860ff657e2cdb82f5513f7db2924f1 (HEAD -> master)
Author: nopeNoshishi <nope@noshishi.jp>
Date:   Thu Feb 2 21:17:24 2023 +0900

    third commit

commit 37349c9b05c73281008e7b6b7453b595bb034a52
Author: nopeNoshishi <nope@noshishi.jp>
Date:   Tue Jan 31 23:08:59 2023 +0900

    second

In creating Git, it is difficult to suddenly create something as sophisticated as add or commit.
Therefore, while implementing the plumbing command , we will create add and commit in the development section to bypass the functionality of this command.

Finally

Thank you for reading all the way to the end!!!
This is still a rough explanation, but I hope it contributes to your understanding.
If you may ok, please star my repository!

Reference Site

Officail Documentation

What you need

Listed here are the key elements in making git.

Binary

Byte

Bitwise operation

n-decimal system and character strings

String

Compression algorithms

Hash function

File system

Annotation

zlib

1: This is a free software to compress data losslessly. The main compression algorithm called Deflate is very interesting.Official Site back to article

sha1

2: One of the very famous SHA-based hash functions, characterized by the generation of a 60-bit (20-byte) hash value. Incidentally, the probability of a collision of sha1 hash values is said to be astronomical.The Reality of SHA1 back to article

hash number

3: When you specify a hash value directly in a Git command, you may only use 7 characters. As mentioned in [^2](#ano-2), this means that even with a small input hash value, we can find a specific object because there are almost no hash collisions. It is similar to the feeling of pressing tab in shell to receive input assistance. back to article

compressed string

4: Compressed data is stored in a form that does not correspond to a character code. Therefore, it cannot be read as a specific character code. back to article

mode

5: The mode (permission) can of course also be expressed in binary. And since there are few combinations, certain combinations can be expressed in computation. back to article

Hello Dev community!

Foreword

TOC

Git Inside

Where is repository

Object

Types

Structure

Index (staging area)

Structure

Analyze Object

blob

Data

File name

How about another file

Tree

Commit

Key-Value Store

Summary

Analize Index

Specification

Index

Summary

Background of Command

add

commit

Digression

Deciphering Tree

HEAD and Branch

Plumbing commands

cat-file

hash-object

update-index

ls-files

write-tree

commit-tree

update-ref

Finally

Reference Site

What you need

Annotation

zlib

sha1

hash number

compressed string

mode

Read next

How to Make a Retro 2D JavaScript Game Part 2

Build Your YouTube Video Transcriber with Streamlit & Youtube API's 🚀

Cómo crear un Wallpaper dinámico con la Hora y Fecha usando Python

7 Must-Try Open-Source Tools for Python and JavaScript Developers 🚀

`cat-file`

`hash-object`

`update-index`

`ls-files`

`write-tree`

`commit-tree`

`update-ref`