CodingBlocks
Git from the Bottom Up – Blobs and Trees
It’s surprising how little we know about Git as we continue to dive into Git from the Bottom Up, while Michael confuses himself, Joe has low standards, and Allen tells a joke.
The full show notes for this episode are available at https://www.codingblocks.net/episode191.
News
Thanks for all the great feedback on the last episode and for sticking with us!
Directory Content Tracking
- Put simply, Git just keeps a snapshot of a directory’s contents.
- Git represents your file contents in blobs (binary long object), in a structure similar to a Unix directory, called a tree.
- A blob is named by a SHA1 hashing of the size and contents of the file.
- This verifies that the blob contents will never change (given the same ID).
- The same contents will ALWAYS be represented by the same blob no matter where it appears, be it across commits, repositories, or even the Internet.
- If multiple trees reference the same blob, it’s simply a hard link to the blob.
- As long as there’s one link to a blob, it will continue to exist in the repository.
- A blob is named by a SHA1 hashing of the size and contents of the file.
- A blob stores no metadata about its content.
- This is kept in the tree that contains the blob.
- Interesting tidbit about this: you could have any number of files that are all named differently but have the same content and size and they’d all point to the same blob.
- For example, even if one file were named
abc.txt
and another was namedpasswords.bin
in separate directories, they’d point to the same blob.
- For example, even if one file were named
- This allows for compact storage.
Introducing the Blob
This is worth following along and trying out.
- The author creates a file and then calculates the ID of the file using
git hash-object filename.
- If you were to do the same thing on your system, assuming you used the same content as the author, you’d get the same hash ID, even if you name the file different than what they did.
-
git cat-file -t hashID
will show you the Git type of the object, which should be blob. -
git cat-file blob hashID
will show you the contents of the file. - The commands above are looking at the data at the blob level, not even taking into account which commit contained it, or which tree it was in.
- Git is all about blob management, as the blob is the fundamental data unit in Git.
Blobs are Stored in Trees
- Remember there’s no metadata in the blobs, and instead the blobs are just about the file’s contents.
- Git maintains the structure of the files within the repository in a tree by attaching blobs as leaf nodes within a tree.
-
git ls-tree HEAD
will show the tree of the latest commit in the current directory. -
git rev-parse HEAD
decodes theHEAD
into the commit ID it references. -
git cat-file -t HEAD
verifies the type for the aliasHEAD
(should be commit). -
git cat-file commit HEAD
will show metadata about the commit including the hash ID of the tree, as well as author info, commit message, etc. - To see that Git is maintaining its own set of information about the trees, commits and blobs, etc., use
find .git/objects -type f
and you’ll see the same IDs that were shown in the output from the previous Git commands.
How Trees are Made
- There’s a notion of an index, which is what you use to initially create blobs out of files.
- If you just do a git add without a commit, assuming you are following along here (jwiegly.github.io),
git log
will fail because nothing has been committed to the repository. -
git ls-files --stage
will show your blob being referenced by the index.- At this point the file is not referenced by a tree or a commit, it’s only in the
.git/index
file.
- At this point the file is not referenced by a tree or a commit, it’s only in the
-
git write-tree
will take the contents of the index and write it to a tree, and the tree will have it’s own hash ID.- If you followed along with the link above, you’d have the same hash from the write-tree that we get.
- A tree containing the same blob and sub-trees will always have the same hash.
- The low-level
write-tree
command is used to take the contents of the index and write them into a new tree in preparation for a commit.
- If you followed along with the link above, you’d have the same hash from the write-tree that we get.
-
git commit-tree
takes a tree’s hash ID and makes a commit that holds it.- If you wanted that commit to reference a parent, you’d have to manually pass in the parent’s commit ID with the
-p
argument. - This commit ID will be different for everyone because it uses the name of the creator of the commit as well as the date when the commit is created to generate the hash ID.
- If you wanted that commit to reference a parent, you’d have to manually pass in the parent’s commit ID with the
- Now you have to overwrite the contents of
.git/refs/heads/master
with the latest commit hash ID.- This tells Git that the branch named
master
should now reference the new commit. - A safer way to do this, if you were doing this low-level stuff, is to use
git update-ref refs/heads/master hashID
.
- This tells Git that the branch named
-
git symbolic-ref HEAD refs/heads/master
then associates the working tree with theHEAD
ofmaster
.
What Have We Learned?
- Blobs are unique!
- Blobs are held by Trees, Trees are held by Commits.
-
HEAD
is a pointer to a particular commit. - Commits usually have a parent, i.e. previous, commit.
- We’ve got a better understanding of the
detached HEAD
state. - What a lot of those files mean in the
.git
directory.
Resources We Like
- Things I wish everyone knew about Git (Part 1) (blog.plover.com)
- Git from the Bottom Up by John Wiegley (jwiegley.github.io)
- Why is Git … called Git?
Tip of the Week
- Have you ever heard the tale of … the forbidden files in Windows? Windows has a list of names that you cannot use for files. Twitter user @foone has done the unthinkable and created a repository of these files. What would happen if you checked this repository out on Windows?
- Thanks to Derek Chasse for this tip!
- When you use
mvn dependency:tree
,grep
is your enemy. If you want to find out who is bringing in a specific dependency, you really need to use the-Dincludes
flag.
- When you use
- Thanks to @ttutko for this tip about redirecting output:
-
kafkacat 2>&1 | grep "".
If you’re not familiar with that syntax, it just means pipeSTDERR
toSTDOUT
and then pipe that togrep
.
-
- Thanks Volkmar Rigo for this one!
- Dangit, Git!? Git is hard: messing up is easy, and figuring out how to fix your mistakes is impossible. This website has some tips to get you out of a jam. (DangitGit.com)
- How to vacay … step 1 temporarily disable your work email (and silence Slack, Gchat, whateves).
- On iOS, go to Settings -> Mail -> Accounts -> Select your work account -> Turn off the Mail slider.