In the earliest days of my career, I never used version control. At that time, Git didn't exist and most version control systems were clunky tools used mostly by corporate engineering teams. I was doing a lot of freelance consulting, so most of my projects were just me working alone. SFTP, occasional backups, and lots of
*.bak files worked reasonably well. Still, I lost work or wished I could return to an earlier version after multiple refactors... especially since I was still learning.
When I joined my first team as a developer, I was introduced to "Subversion". This was a whole new world - the ability to see all changes, go back in time, and to safely make changes was great. It changed the way I thought about code. The problem is that it was slooooow. I was working on a large project, and updates took quite a while.
Then, I joined a company as a Sr. Developer, and they were using this new thing called "Git". I wasn't familiar with it, and actually, my first commit to the main branch broke the app in production (cue heart attack)! I had no idea about merge conflicts or how to fix them. However, once I got the basics down, I became a huge fan. It was FAST. So much faster than Subversion, and really much easier to use. With just a few commands, I was fairly proficient, and for more complex things I could easily find help online.
Today, it is almost impossible to learn about software development without learning about Git. For many developers, the command
git commit is a daily ritual. It's the heartbeat of countless projects yet, how many of us know the story behind this powerful tool? We've all heard bits and pieces, but let's take a deeper look at Git and how it came into being.
"I'm a very lazy person who likes to get credit for things other people actually do. In the Linux project, I started it, and I'm still considered the central point, but I'm not a good programmer. I'm a good manager. My only job is to say 'no' to people, and sometimes I accept patches and put them into my tree. I wanted a tool where I can do that. That was my only design criteria, and it turns out when you do that kind of tool, you can use it for a lot of other things too."
-- Linus Torvalds
To start with, what the heck is Git? At its core, Git is a distributed version control system (VCS). It lets multiple users track changes in source code during software development, maintaining a history of code changes, and ensuring traceability. It is like a shared digital diary for computer code, allowing many people to write in it while keeping a record of all changes made over time.
But unlike many version control systems that came before it, Git operates on a distributed model, meaning every developer's working copy of the code is also a repository with complete history and version tracking abilities, independent of network access. This means that developers can work locally on their machine, and then sync those changes later. This was a big change in how VCS affects the daily workflow of a developer.
One of Git's key advantages is its efficiency in handling changes. Instead of sending an entire copy of the codebase each time, Git only transmits the differences (or changes) from the last known version. This approach significantly reduces the amount of data transmitted and results in faster sync times, providing a performance advantage, especially for large projects.
"So I’d like to stress that while it really came together in just about ten days or so (at which point I did my first *kernel* commit using git), it wasn’t like it was some kind of mad dash of coding. The actual amount of that early code is actually fairly small, it all depended on getting the basic ideas right. And that I had been mulling over for a while before the whole project started. I’d seen the problems others had. I’d seen what I wanted to avoid doing."
-- Linus Torvalds
Flashback to 2005. The Linux kernel, an enormous open-source project, used a proprietary VCS called BitKeeper. However, due to a conflict between the community and the company behind BitKeeper, the free-of-charge status was revoked. Necessity is the mother of invention, and this incident paved the way for the creation of a new system. Linus Torvalds, the creator of the Linux kernel, took the reins. He aimed to create a tool that was:
- Distributed: Unlike other systems where a central repository was required.
- Compatible: The new tool would incorporate as many features and workflows as BitKeeper.
- Secure: Ensure the integrity of source code and protect against corruption, accidental or malicious.
So, after a few days of work, Torvalds had a working prototype of Git, and less than a month later, Git was managing the Linux kernel source code. On April 7, 2005, Linus Torvalds made the first-ever commit to Git. It wasn’t a grand feature; it was a simple README file. However, the major version (v1.0) took its time and was released on December 21, 2005. But, what's interesting is that version 2.0 was released almost a decade later on June 1, 2014.
Git got the important things right in the beginning, and that set the stage for a valuable project.
'Well, it was obviously designed for our workflow, so that is part of it. I’ve already mentioned the whole “distributed” part many times, but it bears repeating. But it was also designed to be efficient enough for a biggish project like Linux, and it was designed to do things that people considered “hard” before git – because those are the things *I* do every day.'
-- Linus Torvalds
Concurrent Versions System (CVS) was one of the pioneering tools in the version control domain. So how does Git stand out when compared to such a veteran system?
- Distributed vs. Centralized: CVS follows a centralized model, meaning the version history is stored in a central server. If that server crashes without any backups, you lose everything. Git’s distributed approach ensures that every developer has a local copy of the entire history, making it decentralized and significantly safer.
- Performance: Git’s local operations, thanks to its distributed nature, means that many tasks are faster in Git than CVS. There's no need to communicate with a central server for every tiny operation.
- Branch Management: Branching in Git is a walk in the park. It's an integral part of the workflow. In CVS, branching is a cumbersome process, often avoided due to its complexity.
- Atomic Operations: Git operations, like commits, happen atomically. Either they succeed with all changes or fail without any. This isn't always the case with CVS.
- Data Integrity: Git uses a SHA-1 hash to manage data, ensuring the repository’s integrity. If something goes awry, it's immediately noticeable.
"Just to pick an example: the concept of “merging” was generally considered to be something really quite painful and hard in most SCM’s. You’d plan your merges, because they were big deals. That’s not acceptable to me, since I commonly do tens of merges a day when in the merge window, and even then, the biggest overhead shouldn’t be the merge itself, it should be testing the result. The “git” part of the merge is just a couple of seconds, it should take me much longer just to write the merge explanation message."
-- Linus Torvalds
Git's inception was driven by the need for performance, but that didn’t stop the community from further refining it.
- Packed Refs: Instead of keeping every single object (commit, tree, blob) as individual files, Git packs them.
- Delta Compression: When packing, Git identifies the differences between versions and stores just the changes.
- Garbage Collection: Over time, some objects become obsolete. Git has a garbage collector that removes these unnecessary objects.
These optimizations, combined with many under-the-hood improvements, ensure that Git remains lightning-fast, even for mammoth repositories. Soon after its creation, developers around the world started contributing. Junio Hamano is one such name that stands out. Within just a few months of Git's inception, he took over its maintenance, ensuring it didn’t remain just a side project but evolved into a robust system.
Under Hamano's stewardship and with contributions from developers worldwide, Git was not just about committing or branching. It introduced:
- Staging Area: This intermediary area allows developers to format and review commits before finalizing them.
- Remote Repositories: With platforms like GitHub and GitLab, remote repositories became a staple, facilitating collaboration among developers globally.
- Hooks: Custom scripts triggered by important actions, enhancing Git's capabilities and automation.
It's hard to imagine Git without some of these features today, but think about how innovative and unique these were when they were created.
“I'm an egotistical bastard, so I name all my projects after myself. First Linux, now Git."
-- Linus Torvalds
In an email, when asked about the name, he once quipped, “The name 'git' was given by Linus Torvalds when he wrote the very first version. He described the tool as ‘the stupid content tracker’ and the name as (depending on your way): random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of 'get' may or may not be relevant. Stupid. Contemptible and despicable. Simple. Take your pick from the dictionary of slang.”
"Git" is British slang for a silly or contemptible person. Silly? Irreverent? Absolutely! But it’s another testament to Torvald’s personality and the informal, community-driven spirit of open-source software.
From its rapid inception to becoming the backbone of software versioning, Git’s journey is a testament to open-source power. Its success lies not just in its utility but in a global community that continues to nurture and refine it. Git has transitioned from a quick solution to a kernel project hiccup into an essential tool for individual developers and tech giants alike. Its distributed nature, performance optimizations, and robust branching capabilities make it superior to many version control predecessors.
So, as you
git push your next big feature or troubleshoot with a
git blame, take a moment to appreciate this tool's history, and perhaps, share its story. Because, in a way, every developer using Git today is part of its ongoing saga.