kataoka_nopeNoshishi

Posted on Jan 5, 2023 • Edited on Feb 14, 2023

Understanding Git through images

#git #beginners #image #tutorial

Hello Dev community!

I am a newbie, still a few months into my career as a developer in Japan. I was inspired by Nico Riedmann's Learn git concepts, not commands, and I have summarized git in my own way. Of course, I supplemented it with reading the official documentation as well.
Understanding git from its system structure makes git more fun. I have recently become so addicted to git that I am in the process of creating my own git system.

Recently, I wrote how to make software like git!
Make original git

What is Git?
Start new work
Branch
Merge
Rebase
- Move the branch
- Deal with rebase conflicts
Keep local repositories up-to-date
Useful Functions
End
Reference

What is Git?

Manage versions and Distribute work

Git is a type of source code management system called a distributed version control system.
Git is a tool to facilitate development work by recording and tracking the changelog (version) of files, comparing past and current files, and clarifying changes.
The system also allows multiple developers to edit files at once, so the work can be distributed.

Using Git means

First, make a copy of the file or other files in a storage location that can be shared by everyone (from now on referred to as "remote repository") on your computer (from now on referred to as "local repository"), and then add or edit new code or files.
Then, the files will be updated by registering them from the local repository to the remote repository.

Understading by image

When dealing with Git, it is important to follow "how to work" from "what" to "what".
If you only operate commands, you may not understand what is happening and use the wrong command.

(info)

When manipulating Git, try to imagine what is happening before and after the operation.

Start new work

Repositories

A repository in Git is a storage for files, which can be remote or local.

Remote Repository is a repository where the source code is placed on a server on the Internet and can be shared by everyone.
Local repository is a repository where the source code is located on your computer and only you can make changes.

Copy the repository and start working

First, prepare your own development environment.
All you need only to do is decide in which directory you will work.
For example, your home directory is fine, or any directory you normally use.

Next, copy and bring the files from the remote repository.
This is called clone.

The remote repository called project contains only first.txt, and this is the image when you clone the remote repository.

(info)

Of course, you may create a local repository first and then reflect the remote repository.
This is called initialize and allows you to convert a directory you are already working on into a repository.

(Supplemental) Working Directory

A working directory is not any special directory, but a directory where you always work on your computer.
It's easier to understand if you think of it as a directory where you can connect to the target directory that Git manages (in this case, project) with a Git staging area or local repository.

Change and Add file

Changes to the source code are made through the working directory, the staging area.
Actually, in the working directory, we work.

Let's create a new file called second.txt.

Next, move the modified file to the staging area.
This is called add.

It is a feature of Git that there is a cushion before changes are reflected in the local repository.
I will explain why this cushion exists in more detail later.

Then, we registere the content in the staging area to the local repository.
This is called commit.

By the way, we can comment when you commit.
In this case, we added a file, so write git commit -m 'add second.txt'.

(info)

When you commit, a commit object is created in the repository.
A simple explanation of a commit object is the data that has the updater's information and the modified file.
(All data is saved, not just the differences, but the entire state of the file at that moment (snapshot).
Please refer to Git Objects for more information about Git objects.

Adapt to remote repositories

Then, the work is done!
The last step is to reflect the changes in the local repository to the remote repository.
This is called push.

It may be easier to understand if you think of it as a commit to a remote repository.

View Differences

Changes between the same file are called diff.
We can see the changing points in the file.

I won't go into the details of the commands, but here are three that I use frequently.
git diff --stage to see the changes from the original working directory before you add.
git diff --stage to see changes to the working directory after add.
git diff <commit> <commit> to compare commits.

(Aside) One step called staging area

As development work grows, we often make many changes in one working directory.
What happens if you put all the changes in a local repository at once?
In this case, when parsing the commits, you may not know where a feature was implemented.

In Git, it is recommended to do one commit per feature.
This is why there is a staging area where you can subdivide the commit unit into smaller units.

The concept of Git is to stage only what is needed, and then proceed with the work or commit ahead of time to promote efficient development that can be traced back through the history of each implementation.

Summary

The basic workflow is to clone once and then add, commit, and push for each working.

(info)

clone: Make a copy from the remote repository to your development environment (local repository and working directory).
add: Add files from the working directory to the staging area and prepare them for commit.
commit: Register the file from the staging area to the local repository. At this time, a commit object is created.
push: Register changes from the local repository to the remote repository.

Branch

We create a branch to change and add files in multiple branches.
The files saved in the main branch are in ongoing use.
The reason for the separate branches is to work without affecting the currently running source code.

Create new branch

Let's create the branch called develop!
We can create a branch with git branch <new branch> or git checkout -b <new branch>.
The former just create a branch, the latter create a branch and moves you to that branch.
(Branches are maintained in the repository.)

The key point when generating branches is which branch to derive from.
We can specify the source as git checkout -b <new branch> <from branch>.
If we don't, the branch you are currently working on becomes the <from branch>.

(info)

A branch is actually a pointer to the commit (strictly speaking, a hash of commit objects).
Generating a new branch means that the new branch indicate to the commit that the from　branch pointed to as well.

Work in Branches

Moving the branch is called checking out.
The pointer to the branch you are currently working on is called HEAD.
So, moving from the main branch to the develop branch means changing the HEAD.

Now both branches point to the commit named Atr3ul.
You just added second.txt by committing in the main branch, so you are ahead of the commit f27baz.
From here, let's say you change second.txt in the develop branch and make a new commit.

Then, as shown in the figure, the develop branch created a commit called m9sgle and pointed to that commit.

The current HEAD position (working branch position), what stage the file has been worked on, or the status of who is working on it is called status.

(info)

If you are familiar with object-oriented, you may understand the reason for the arrow on the commit.
It represents the relationship between a "parent" commit and a "child" commit.
The assumption is that parent←-child, that is, how much the child (commit) born from the parent (commit) has grown (changed).

(Aside)Git-Flow and GitHub-Flow

The way branches to manage will vary on development team.
On the other hand, like programming naming conventions, there is a general model for how to grow branches in Git.
Here are two simple ones. I think it's enough to know that there is such a thing.

The "Git Flow" is a fairly complex and intricate structure.
I think it's a model of how Git should be used.

Definition of each branch.

master: Branch to release a product. No working on this branch.

development: Branch to develope a product. When ready to release, merge to release. No working on this branch.

feature: Branch for adding features, merged into development when ready for release.

hotfix: For urgent post-release work (critical bug fixes, etc.), branch off from master, merge into master, and merge into develop.

release: For preparation of product release. Branch from develop with features and bug fixes to be released.
When ready for release, merge to master and merge to develop.

The "GitHub Flow" is a somewhat simplified model of the Git Flow.

As you can see, it consists of only master and feature.
The important difference is the cushion of pull requests (explained in the pull below), which allows integration between branches.

Summary

Basically, since there is no work on main (master), we create a branch for each work unit we want to do and create a new commit.

(info)

branch: New pointer to the commit
checkout: Move HEAD to change the branch to work on.

Merge

integrating the branches is called merge.
Basically, we merge into the main or develop branch.
Be careful not to mistake the subject of which branch is merging (absorbing) which branch.
We will always move (HEAD) to the branch from which you are deriving, and then do the integration from the branch from which you are deriving.

I am currently working on the feature branch and have created the following third.txt.

third.txt



Hello, World! I'm noshishi, from Japan.
I like dancing on house music.

Then We add and finished up to commit.

Fast Forward

When the feature branch points to the commit that can be traced back to the develop branch, the develop branch is in a fast-forward state.

First, move to develop with checkout.

In this case, the develop branch has not progressed at all, so to merge the feature branch will simply move the commit forward.
In this case, the develop and feature branches share the same commit.

No Fast Forward

What if the develop branch has progressed to a new commit by commit or merge?
This is called a no fast-forward situation.

In the develop branch, you have made changes to first.txt and have finished commit.
So the develop branch and the feature branch are completely split.

If you try to merge a feature branch from a develop branch, Git will check your changelog against each other.
If there are no conflicting edits, a merge commit is created immediately.
This is called an automatic merge.

Deal with Conflicts

In the no fast-forward state, the differences in work content. is called conflict.
In this case, we must manually fix the conflict content and commit.

In the develop branch, we created the following third.txt and committed.

third.txt



Hello, World! I'm nope, from USA.
I like dancing on house music.

In the develop branch, I'm nope, from USA.
In the feature branch, I'm noshishi, from Japan.
The content of the first line is in conflict.

If you do a merge at this time, a conflict will occur.
Git will ask you to commit after resolving the conflict.

(The branch we work on is the develop branch)

If you look at third.txt as instructed, you will see the following additions

third.txt　（after conflict）



<<<<<<<< HEAD
Hello, World! I'm noshishi, from Japan.
=======
Hello, World! I'm nope, from USA.
>>>>>>>> feature
I like dancing on house music.

The upper HEAD, separated by =======, represents the contents of the develop branch.
The lower side represents the feature branch.

You first considered which one to adopt, and decided to adopt the changes made in the feature branch this time.
The only operation then is to edit third.txt by hand (delete unnecessary parts).

third.txt　(After editing)



Hello, World! I'm noshishi, from Japan.
I like dancing on house music.

And the next thing you do is add and commit.
The conflict is resolved and a new merge commit is created.

Conflicts are feared by beginners, but once you learn this, you will no longer be afraid.

(info)

If you merge and resolve the conflict, why not merge again?
When you merge once, the develop branch enters the merge state, and if there are no conflicts, the new files are automatically added and commit.
So it is not a special commit after conflict is resolved.
That's why it's called merge commit.

Delete unnecessary branches

The merged branch is basically useless, so we will delete it.
If we leave a branch alone, you can move from the branch you want to delete to another branch and git branch -d <branch>.
You may think the commits on that branch are deleted.
In fact, the commits are carried over to the merged branch.
You can use git log to see all the commits you've made on the branch and the commits on the merged branch.

(Aside) What is the branch

We said that a branch is a pointer to a commit, but it also holds another important data.
It is all the commits that have been made on that branch.

A branch is a collection of commits, and it has a pointer to the latest commit in that collection. (Strictly speaking, the commit can trace back to previous commits.)

The following diagram illustrates this.

So we can think of branches on a horizontal axis like Git Flow.
By the way, if you draw the above diagram with branches on the horizontal axis, it looks like this.

Summary

fast-forward merge

no fast-forward merge

no fast-forward merge with conflict

(info)

merge: To integrate (absorb) a working branch (such as feature) into a specific branch (such as main or develop) and create a new commit.

Rebase

Rebase is the process of merging branches by changing the commit from which the branch is derived.
It is similar to merge, except that the branch you are working on is the destination branch.

Suppose you are working on the develop and feature branches.

Move the branch

You may think to reflect the current commit on develop branch into feature branch.
You need to move feature branch from the gp55sw commit to the 3x7oit commit.

This can be moved at once from the feature branch by doing a git rebase develop.

This process is more like re-growing the feature branch from the latest commit on the develop branch than doing a merge.
The difference is that you move the entire commit and make a new commit.

One reason for such a move is that it is fast-forward and easy to merge at any time.
The other reason is that the commits are aligned so that the commit history can be easily traced and the order in which files are updated is consistent.

Deal with rebase conflicts

Of course there is also a conflict in rebase.
You added fourth.txt in the feature branch, but you didn't change fourth.txt in the develop branch.
There is conflict.

However, if the following changes are covered by each other, conflict will occur.

You can just deal with it the same way you would with merge.
However, After you have checked the diff and finished editing the file, you should finish your work with git rebase --continue.
You don't have to commit, it will commit automatically.

(info)

rebase: Move the commit from which the derived branch to a new commit.

Keep local repositories up-to-date

After some local work, you may be faced with a situation where the remote repository has been updated by another developer.
In this case, you can use pull to re-install the information from the remote repository back into the local repository.

Branch and Repository

Branches are stored in each repository.
This is the branch where the actual work is done.

On the other hand, the local repository has the copied branches of the remote repository.
This is called a "remote tracking branch".
It is a branch with a name that is tied to the remote branch in remotes/<remote branch>.

This is only monitoring the remote repository.

Check the latest status

Suppose you have a situation where the develop branch in the remote repository is one step ahead of the remote tracking branch.

Reflecting the latest status of a branch in a remote repository on a remote tracking branch is called fetch.

Update to the latest status

If you want to have it reflected in your local branch, you can do a pull.
When you pull, the local remote tracking branch is updated first.
Then merge to the local branch.

This time, there was a commit that went one branch ahead of the develop branch, so you created a new commit by merge into the local develop branch.

Deal with pull conflicts

When a remote repository commit conflict with a local repository commit, you face the conflict between the remote tracking branch and the local branch when you pull.
In the following case, the remotes/develop and develop branches are in conflict.

Since push is fetch and merge, you can solve in the same way as conflict in merge.
This time, develop merges remotes/develop, so the working branch is develop.
Open the folder that caused the problem and commit when you have fixed it.

(Aside) Identity of pull requests

Basically, the relationship between remote and local is pull from the remote repository to the local repository and push from the local repository to the remote repository.
However, GitHub and other services have a mechanism to send a request before merge from a branch in a remote repository to a branch such as main.
This is because if a developer pushes to the main branch and updates the remote repository, no one can check it and a major failure may occur.
Pull request is to insert a process where a higher level developer reviews the code once.

(info)

pull: fetch + merge. pull is to reflect the state of the remote repository in the local repository.

Useful Functions

Correct the commit

To commit to correct a previous commit is called revert.
For example, suppose you added second.txt to your local repository with m9sgLe.

When you revert, the commit is revoked and second.txt is no longer in the local repository.

The merit of revert is that it allows you to leave commit.
Distinguish this from reset, which will be introduced later.

Delete the commit

To undo the current latest commit and work on it again is called reset.

The --soft option allows you to go back to the stage immediately after add.
The --mixed option allows you to go back to the stage where you were working in the working directory.
The --hard <commit> option removes all commits up to the commit point you are returning to and moves head to the specified commit.

Since reset completely deletes the commit, it is recommended that you do not use it unless you have a good reason, especially for the '--hard' option.

If you want to get your commits back, you can use git reflog to see the commits you have deleted.

Evacuate the work

Since you can't move to another branch if there are change files, you have to choose between going to commit or discarding your changes.
This is where stash comes in handy.
You can temporarily evacuate files in the working directory or staging area.

When you want to move to another branch, stash and when you return, use stash pop to retrieve the evacuated files and resume work.

Bring the commit

Bringing any commit to the current branch to create a commit is called cherry-pick.
It is a very nice feature.

This is used when you want to bring back only features previously implemented in a feature branch and use them for work in the current develop branch, for example.

Mastering HEAD

I explained that HEAD is a pointer to the branch you are currently working on.
I also explained that a branch is a pointer to a commit.

See the figure below.

HEAD points to the develop branch, and the develop branch points to the commit eaPk76.
So, HEAD in this situation refers to the commit eaPk76.

Have you often seen Git documentation or articles that use HEAD after a command?
For example, git revert HEAD.
This is a command that can be achieved because you can replace HEAD with commit.

End

Source code management without Git

Mercurial has the same history as Git.
Mercurial has a very simple command line interface (CLI) that sacrifices the flexibility of Git.
Recently, based on Mercurial, Meta released a new source code management system called Sapling as open source.
I would like to try it again and write about my impressions.

Where is the remote repository

A hosting service is a service that rents a server for a remote repository.
Typical examples are GitHub, Bitbucket, and Aws Code Commit for private use.
Git and Git Hub are completely different.
By the way, as mentioned above, we can use our own servers for remote repositories.

Pointer

If you have been exposed to programming that deals directly with memory, such as the C programming language, you will somehow know what a "pointer" is.
On the other hand, for a beginning programmer, it seems very vague.

I said that commit objects are stored in the repository.
If there are many commit objects in the repository, how can you select the one you want?

We need a label (address) to locate a particular commit object.

The "pointer" is a valuable data that indicates us to the label so that we don't forget it.

The label, by the way, is converted into a mysterious string through a hash function.
If you are curious, please refer to How does Git compute file hashes?.

To further understand Git

There are many things I failed to mention in this article.

The core of Git is a simple key-value type data store
Details of the Git object that is the value
How to relate with each objects.

I hope to fully explore this someday.

References

Top comments (16)

Alessandro Candido • Jan 11 '23

Notice that, as you say in the beginning, Git is a distributed VCS, and not a centralized one.

So, there is no concept of a single remote repo, but instead you have a single local repo (the one you're working in), and many remotes, and they are all the same, from Git perspective.

So, you can have a remote on GitHub, one on GitLab, one on your own desktop at home, and one on your colleague's computer, and for as long as they are reachable you can push on, and pull from, any of them.

True that often there is a single instance on an hosted server, and everyone connects to that one, but this is not a Git concept, just a standard practice.

kataoka_nopeNoshishi • Jan 11 '23

Thanks for the comment!
As you point out, with Git, remote repository servers can be created without having to rely on a hosting service like github.
Official website

I wrote this article this way because I wanted to emphasize my point about managing versions in a distributed manner.
Thanks for the very great feedback!!!

olsard • Jan 6 '23

Thanks for your great work, with lots of handwritten examples!

bibisixtynine • Jan 10 '23

Great Post ! Thanks ! Just one small remark, the animated gif are hard to follow on ios safari, because the frame rate is rather high for my slow brain, and the pause button... rewind the gif to the begining.

Terenze • Jan 8 '23

Thank you Kataoka. This is such a valuable piece of work that unpacks Git for those that are new to it and brings a fresh perspective to its understanding. Agree that just learning a set of commands doesn’t help you fully appreciate the power of Git. Good luck with you own version.