Learn how to mirror a git repository in a few simple steps
Background
Before I describe how to duplicate a git repository, let me explain you the context. At Droids On Roids, we prefer using our own tools for CI/CD, VCS hosting and other processes. All our developers know them well, which speeds up development in comparison to using separate clients’ tools for each individual project.
Usually, it doesn’t matter what we are using internally. However, regarding the repository, clients may want to store the code on servers controlled by them. It’s perfectly understandable, since they usually also purchase the intellectual property, so they own the entire code.
Sometimes that hosting on the client side may be not perfect.
For example, it may use a different provider eg. GitLab instead of GitHub, which we use on a daily basis. Or it may even be a self-hosted Git server without any web interface accessible using only a single key.
How can we perform a code review process or setup webhooks for CI on something like that? Let’s see what we can do!
The cases
Firstly, the necessity to store the code in a particular place does not mean that it is the only location.
We can still use our favorite GitHub for daily development and only duplicate (mirror) the code to the client’s repository.
To do that efficiently, let’s first answer a few questions.
- Do we need to mirror everything, or only the particular branches and/or tags?
- What is the trigger for the mirror operation?
Usually, the client is only interested in some milestones/releases, rather than all the intermediate work. In such a case, we usually want to mirror only the main branch and maybe some tags.
Note that we are not taking any legal issues into account here. Check if there are no restrictions regarding the storage location in the contract. Sometimes it may be forbidden to push the code to public clouds.
The scope
Mirroring everything is possible but keep in mind that the target hosting configuration has to be compatible with the source.
For example, branches containing force pushed commits on the source must not be protected on the target. Pushing some constructs not used on the target side may also produce errors or warnings eg. Gerrit creates references (which are not necessarily pointing to branches or tags) which may not be understood by the likes of GitHub.
The trigger
Mirroring is often triggered when some milestone is reached, such as at the end of the iteration/sprint or just on the regular basis eg. weekly or daily. I would recommend doing that operation on a separate CI build, not the one which you even can have by occasion.
For example, even if you have a “sprint” build releasing the product, you should not add mirroring as one of its stages/steps. Why?
Well, you have no control over the target repository. If a repo becomes unreachable, someone revokes the key or pushes the same tag you want to push, then your mirroring will fail, even if the artifact was released successfully.
Moreover, the local copy used to build the product may not checkout a branch you want to mirror, but instead checkout a particular commit as a detached HEAD.
For example, on Bitrise CI a trigger associated with a push (to branch) action checks out the pushed commit. It was on the top of the branch at the point of triggering but at the time of building (cloning), new commits may appear after it, because there might be pushes in the meantime.
The code
Let’s start coding! Mirroring itself consist of 2 operations:
- Cloning (or preparing an already cloned copy).
- Pushing.
Getting access to the repositories by activating SSH keys, for example, is out of the scope but, if needed, it has to be performed before accessing the particular repository.
In case of the (recommended) fresh build, cloning looks like this for a single branch:
git clone --bare --single-branch --no-tags --branch=<branch name> <source url>
.
Or like this for the entire repository:
git clone --mirror <source url>
.
Where <branch name>
is the name of the branch you want to clone, and <source url>
is the source (not the target!) repo URL. Note the trailing dot! It means the current directory.
Make sure that it is empty. Some CI platforms may create some temporary files there or set the initial directory to something usually not empty, such as $HOME. In such cases, execute this command beforehand: pushd $(mktemp -d)
.It will create a temporary folder (with a unique, not occupied name) and change the current directory to it (like cd command).
Note also the --bare
and --mirror
options. They ensure no working directory is created. As a result, no head is checked out (it is useless when we don’t need to work directly on the repository content). That’s why the bare clone operation is faster than the normal clone. If you want to also push the tags, just remove the --no-tags
option.
In order to send the changes to the target repository, you can use the git push command. In the case of a single branch, omit the –tags option if you don’t need to push the tags:
git push --tags <target url> <branch name>
Or for mirroring the entire repo:
git push --mirror <target url>
Be careful that such a push will also (try to) replicate all forced pushes and branch deletions! Some data may be practically lost after that operation. Restrictions enforced by hostings like protected branches or the inability to delete the default branch will still be held.
Make sure that the user on behalf whom the push is performed has the appropriate permissions. They must be able to write to the repo but should not have admin privileges!
Wrap up
Requirement to store the code in your client’s repository does not always mean you have to use their repository. You can simply synchronize your internal repo with your client’s one, using a few lines of the shell script.
Keep in mind that the push may fail (which should not affect the rest of your workflow) or irreversibly overwrite the data. I hope thanks to this article you will know how to duplicate a git repository. Good luck!
The post appeared first on Droids On Roids.
Top comments (0)