Hello Dev community!
I am a newbie, still a few months into my career as a developer in Japan. I was inspired by Nico Riedmann's Learn git concepts, not commands, and I have summarized git in my own way. Of course, I supplemented it with reading the official documentation as well.
Understanding git from its system structure makes git more fun. I have recently become so addicted to git that I am in the process of creating my own git system.
Recently, I wrote how to make software like git!
Make original git
- What is Git?
- Start new work
- Keep local repositories up-to-date
- Useful Functions
What is Git?
Manage versions and Distribute work
Git is a type of source code management system called a distributed version control system.
Git is a tool to facilitate development work by recording and tracking the changelog (version) of files, comparing past and current files, and clarifying changes.
The system also allows multiple developers to edit files at once, so the work can be distributed.
Using Git means
First, make a copy of the file or other files in a storage location that can be shared by everyone (from now on referred to as "remote repository") on your computer (from now on referred to as "local repository"), and then add or edit new code or files.
Then, the files will be updated by registering them from the local repository to the remote repository.
Understading by image
When dealing with Git, it is important to follow "how to work" from "what" to "what".
If you only operate commands, you may not understand what is happening and use the wrong command.
When manipulating Git, try to imagine what is happening before and after the operation.
Start new work
A repository in Git is a storage for files, which can be remote or local.
Remote Repository is a repository where the source code is placed on a server on the Internet and can be shared by everyone.
Local repository is a repository where the source code is located on your computer and only you can make changes.
Copy the repository and start working
First, prepare your own development environment.
All you need only to do is decide in which directory you will work.
For example, your home directory is fine, or any directory you normally use.
Next, copy and bring the files from the remote repository.
This is called
The remote repository called
project contains only
first.txt, and this is the image when you
clone the remote repository.
Of course, you may create a local repository first and then reflect the remote repository.
This is called
initialize and allows you to convert a directory you are already working on into a repository.
(Supplemental) Working Directory
A working directory is not any special directory, but a directory where you always work on your computer.
It's easier to understand if you think of it as a directory where you can connect to the target directory that Git manages (in this case,
project) with a Git staging area or local repository.
Change and Add file
Changes to the source code are made through the working directory, the staging area.
Actually, in the working directory, we work.
Let's create a new file called
Next, move the modified file to the staging area.
This is called
It is a feature of Git that there is a cushion before changes are reflected in the local repository.
I will explain why this cushion exists in more detail later.
Then, we registere the content in the staging area to the local repository.
This is called
By the way, we can comment when you
In this case, we added a file, so write
git commit -m 'add second.txt'.
When you commit, a commit object is created in the repository.
A simple explanation of a commit object is the data that has the updater's information and the modified file.
(All data is saved, not just the differences, but the entire state of the file at that moment (snapshot).
Please refer to Git Objects for more information about Git objects.
Adapt to remote repositories
Then, the work is done!
The last step is to reflect the changes in the local repository to the remote repository.
This is called
It may be easier to understand if you think of it as a commit to a remote repository.
Changes between the same file are called
We can see the changing points in the file.
I won't go into the details of the commands, but here are three that I use frequently.
git diff --stage to see the changes from the original working directory before you
git diff --stage to see changes to the working directory after
git diff <commit> <commit> to compare commits.
(Aside) One step called staging area
As development work grows, we often make many changes in one working directory.
What happens if you put all the changes in a local repository at once?
In this case, when parsing the commits, you may not know where a feature was implemented.
In Git, it is recommended to do one
commit per feature.
This is why there is a staging area where you can subdivide the
commit unit into smaller units.
The concept of Git is to stage only what is needed, and then proceed with the work or
commit ahead of time to promote efficient development that can be traced back through the history of each implementation.
The basic workflow is to
clone once and then
push for each working.
clone: Make a copy from the remote repository to your development environment (local repository and working directory).
add: Add files from the working directory to the staging area and prepare them for commit.
commit: Register the file from the staging area to the local repository. At this time, a commit object is created.
push: Register changes from the local repository to the remote repository.
We create a
branch to change and add files in multiple branches.
The files saved in the
main branch are in ongoing use.
The reason for the separate branches is to work without affecting the currently running source code.
Create new branch
Let's create the branch called
We can create a branch with
git branch <new branch> or
git checkout -b <new branch>.
The former just create a branch, the latter create a branch and moves you to that branch.
(Branches are maintained in the repository.)
The key point when generating branches is which branch to derive from.
We can specify the source as
git checkout -b <new branch> <from branch>.
If we don't, the branch you are currently working on becomes the
A branch is actually a pointer to the commit (strictly speaking, a hash of commit objects).
Generating a new branch means that the new branch indicate to the commit that the from branch pointed to as well.
Work in Branches
Moving the branch is called
The pointer to the branch you are currently working on is called
So, moving from the
main branch to the
develop branch means changing the
Now both branches point to the commit named
You just added
second.txt by committing in the
main branch, so you are ahead of the commit
From here, let's say you change
second.txt in the
develop branch and make a new commit.
Then, as shown in the figure, the
develop branch created a commit called
m9sgle and pointed to that commit.
The current HEAD position (working branch position), what stage the file has been worked on, or the status of who is working on it is called
If you are familiar with object-oriented, you may understand the reason for the arrow on the commit.
It represents the relationship between a "parent" commit and a "child" commit.
The assumption is that
parent←-child, that is, how much the child (commit) born from the parent (commit) has grown (changed).
(Aside)Git-Flow and GitHub-Flow
The way branches to manage will vary on development team.
On the other hand, like programming naming conventions, there is a general model for how to grow branches in Git.
Here are two simple ones. I think it's enough to know that there is such a thing.
The "Git Flow" is a fairly complex and intricate structure.
I think it's a model of how Git should be used.
Definition of each branch.
master: Branch to release a product. No working on this branch.
development: Branch to develope a product. When ready to release, merge to
release. No working on this branch.
feature: Branch for adding features, merged into development when ready for release.
hotfix: For urgent post-release work (critical bug fixes, etc.), branch off from master, merge into master, and merge into develop.
release: For preparation of product release. Branch from
develop with features and bug fixes to be released.
When ready for release, merge to master and merge to develop.
The "GitHub Flow" is a somewhat simplified model of the Git Flow.
As you can see, it consists of only
The important difference is the cushion of
pull requests (explained in the pull below), which allows integration between branches.
Basically, since there is no work on main (master), we create a branch for each work unit we want to do and create a new commit.
branch: New pointer to the commit
HEAD to change the
branch to work on.
integrating the branches is called
Basically, we merge into the
Be careful not to mistake the subject of which branch is merging (absorbing) which branch.
We will always move (HEAD) to the branch from which you are deriving, and then do the integration from the branch from which you are deriving.
I am currently working on the
feature branch and have created the following
Hello, World! I'm noshishi, from Japan. I like dancing on house music.
add and finished up to
feature branch points to the commit that can be traced back to the
develop branch, the
develop branch is in a
First, move to
In this case, the
develop branch has not progressed at all, so to
feature branch will simply move the commit forward.
In this case, the
feature branches share the same commit.
No Fast Forward
What if the
develop branch has progressed to a new commit by commit or merge?
This is called a
no fast-forward situation.
develop branch, you have made changes to
first.txt and have finished
develop branch and the
feature branch are completely split.
If you try to
feature branch from a
develop branch, Git will check your changelog against each other.
If there are no conflicting edits, a
merge commit is created immediately.
This is called an
Deal with Conflicts
no fast-forward state, the differences in work content. is called
In this case, we must manually fix the
conflict content and
develop branch, we created the following
Hello, World! I'm nope, from USA. I like dancing on house music.
I'm nope, from USA.
I'm noshishi, from Japan.
The content of the first line is in conflict.
If you do a
merge at this time, a
conflict will occur.
Git will ask you to
commit after resolving the
(The branch we work on is the
If you look at
third.txt as instructed, you will see the following additions
third.txt （after conflict）
<<<<<<<< HEAD Hello, World! I'm noshishi, from Japan. ======= Hello, World! I'm nope, from USA. >>>>>>>> feature I like dancing on house music.
HEAD, separated by
=======, represents the contents of the
The lower side represents the
You first considered which one to adopt, and decided to adopt the changes made in the
feature branch this time.
The only operation then is to edit
third.txt by hand (delete unnecessary parts).
third.txt (After editing)
Hello, World! I'm noshishi, from Japan. I like dancing on house music.
And the next thing you do is
conflict is resolved and a new
merge commit is created.
Conflicts are feared by beginners, but once you learn this, you will no longer be afraid.
merge and resolve the
conflict, why not
merge once, the
develop branch enters the
merge state, and if there are no
conflicts, the new files are automatically
So it is not a special
conflict is resolved.
That's why it's called
Delete unnecessary branches
The merged branch is basically useless, so we will delete it.
If we leave a branch alone, you can move from the branch you want to delete to another branch and
git branch -d <branch>.
You may think the commits on that branch are deleted.
In fact, the commits are carried over to the merged branch.
You can use
git log to see all the commits you've made on the branch and the commits on the merged branch.
(Aside) What is the branch
We said that a branch is a pointer to a commit, but it also holds another important data.
It is all the commits that have been made on that branch.
A branch is a collection of commits, and it has a pointer to the latest commit in that collection. (Strictly speaking, the commit can trace back to previous commits.)
The following diagram illustrates this.
So we can think of branches on a horizontal axis like Git Flow.
By the way, if you draw the above diagram with branches on the horizontal axis, it looks like this.
no fast-forward merge
no fast-forward merge with conflict
merge: To integrate (absorb) a working branch (such as
feature) into a specific branch (such as
develop) and create a new commit.
Rebase is the process of merging branches by changing the commit from which the branch is derived.
It is similar to
merge, except that the branch you are working on is the destination branch.
Suppose you are working on the
Move the branch
You may think to reflect the current commit on
develop branch into
You need to move
feature branch from the
gp55sw commit to the
This can be moved at once from the
feature branch by doing a
git rebase develop.
This process is more like re-growing the
feature branch from the latest commit on the
develop branch than doing a
The difference is that you move the entire commit and make a new commit.
One reason for such a move is that it is
fast-forward and easy to
merge at any time.
The other reason is that the commits are aligned so that the commit history can be easily traced and the order in which files are updated is consistent.
Deal with rebase conflicts
Of course there is also a
fourth.txt in the
feature branch, but you didn't change
fourth.txt in the
However, if the following changes are covered by each other,
conflict will occur.
You can just deal with it the same way you would with
However, After you have checked the diff and finished editing the file, you should finish your work with
git rebase --continue.
You don't have to
commit, it will commit automatically.
rebase: Move the commit from which the derived branch to a new commit.
Keep local repositories up-to-date
After some local work, you may be faced with a situation where the remote repository has been updated by another developer.
In this case, you can use
pull to re-install the information from the remote repository back into the local repository.
Branch and Repository
Branches are stored in each repository.
This is the branch where the actual work is done.
On the other hand, the local repository has the copied branches of the remote repository.
This is called a "remote tracking branch".
It is a branch with a name that is tied to the remote branch in
This is only monitoring the remote repository.
Check the latest status
Suppose you have a situation where the
develop branch in the remote repository is one step ahead of the remote tracking branch.
Reflecting the latest status of a branch in a remote repository on a remote tracking branch is called
Update to the latest status
If you want to have it reflected in your local branch, you can do a
pull, the local remote tracking branch is updated first.
merge to the local branch.
This time, there was a commit that went one branch ahead of the
develop branch, so you created a new commit by
merge into the local
Deal with pull conflicts
When a remote repository commit conflict with a local repository commit, you face the
conflict between the remote tracking branch and the local branch when you
In the following case, the
develop branches are in conflict.
merge, you can solve in the same way as
remotes/develop, so the working branch is
Open the folder that caused the problem and
commit when you have fixed it.
(Aside) Identity of pull requests
Basically, the relationship between remote and local is pull from the remote repository to the local repository and push from the local repository to the remote repository.
However, GitHub and other services have a mechanism to send a request before merge from a branch in a remote repository to a branch such as main.
This is because if a developer pushes to the main branch and updates the remote repository, no one can check it and a major failure may occur.
Pull request is to insert a process where a higher level developer reviews the code once.
pull is to reflect the state of the remote repository in the local repository.
Correct the commit
commit to correct a previous commit is called
For example, suppose you added
second.txt to your local repository with
revert, the commit is revoked and
second.txt is no longer in the local repository.
The merit of
revert is that it allows you to leave
Distinguish this from
reset, which will be introduced later.
Delete the commit
To undo the current latest commit and work on it again is called
--soft option allows you to go back to the stage immediately after
--mixed option allows you to go back to the stage where you were working in the working directory.
--hard <commit> option removes all commits up to the commit point you are returning to and moves
head to the specified commit.
reset completely deletes the commit, it is recommended that you do not use it unless you have a good reason, especially for the '--hard' option.
If you want to get your commits back, you can use
git reflog to see the commits you have deleted.
Evacuate the work
Since you can't move to another branch if there are change files, you have to choose between going to
commit or discarding your changes.
This is where
stash comes in handy.
You can temporarily evacuate files in the working directory or staging area.
When you want to move to another branch,
stash and when you return, use
stash pop to retrieve the evacuated files and resume work.
Bring the commit
Bringing any commit to the current branch to create a commit is called
It is a very nice feature.
This is used when you want to bring back only features previously implemented in a
feature branch and use them for work in the current
develop branch, for example.
I explained that HEAD is a pointer to the branch you are currently working on.
I also explained that a branch is a pointer to a commit.
See the figure below.
HEAD points to the
develop branch, and the
develop branch points to the commit
So, HEAD in this situation refers to the commit
Have you often seen Git documentation or articles that use
HEAD after a command?
git revert HEAD.
This is a command that can be achieved because you can replace
HEAD with commit.
Source code management without Git
Mercurial has the same history as Git.
Mercurial has a very simple command line interface (CLI) that sacrifices the flexibility of Git.
Recently, based on Mercurial, Meta released a new source code management system called Sapling as open source.
I would like to try it again and write about my impressions.
Where is the remote repository
A hosting service is a service that rents a server for a remote repository.
Typical examples are GitHub, Bitbucket, and Aws Code Commit for private use.
Git and Git Hub are completely different.
By the way, as mentioned above, we can use our own servers for remote repositories.
If you have been exposed to programming that deals directly with memory, such as the C programming language, you will somehow know what a "pointer" is.
On the other hand, for a beginning programmer, it seems very vague.
I said that commit objects are stored in the repository.
If there are many commit objects in the repository, how can you select the one you want?
We need a label (address) to locate a particular commit object.
The "pointer" is a valuable data that indicates us to the label so that we don't forget it.
The label, by the way, is converted into a mysterious string through a
If you are curious, please refer to How does Git compute file hashes?.
To further understand Git
There are many things I failed to mention in this article.
- The core of Git is a simple key-value type data store
- Details of the Git object that is the value
- How to relate with each objects.
I hope to fully explore this someday.
Top comments (15)
Notice that, as you say in the beginning, Git is a distributed VCS, and not a centralized one.
So, there is no concept of a single remote repo, but instead you have a single local repo (the one you're working in), and many remotes, and they are all the same, from Git perspective.
So, you can have a remote on GitHub, one on GitLab, one on your own desktop at home, and one on your colleague's computer, and for as long as they are reachable you can push on, and pull from, any of them.
True that often there is a single instance on an hosted server, and everyone connects to that one, but this is not a Git concept, just a standard practice.
Thanks for the comment!
As you point out, with Git, remote repository servers can be created without having to rely on a hosting service like github.
I wrote this article this way because I wanted to emphasize my point about managing versions in a distributed manner.
Thanks for the very great feedback!!!
Great Post ! Thanks ! Just one small remark, the animated gif are hard to follow on ios safari, because the frame rate is rather high for my slow brain, and the pause button... rewind the gif to the begining.
Thank you Kataoka. This is such a valuable piece of work that unpacks Git for those that are new to it and brings a fresh perspective to its understanding. Agree that just learning a set of commands doesn’t help you fully appreciate the power of Git. Good luck with you own version.
Thanks for your great work, with lots of handwritten examples!
One who didn't understand shall be enlightened with an article like Yours. Congratulations 👏
Thanks for your effort!
Good for newer 👍
Great article! Great as a tutorial but also for later reference. Thank you for sharing.
Very very very GREAT explanation ! For the first time since many months I've just got it !
Your are a very good Pedagogue !
Great style of tutorial, and attractive by reading.
thanks its really fruitful for every developer
I think this is a very good article for developers, including myself.