This week, as I continue my journey in open-source development, I came to understand a really powerful git tool, that I've been eager to learn for a while now.
In the context of the TIL Page Builder project, we aimed to refactor the codebase, following standard Git workflows while maintaining a clean and customized commit history. To make this possible, I went through the infamous git rebase
and git amend
commands, and I will be sharing my experience of refactoring code and successfully rebasing on master branch of the project.
Table of Contents
1. Table of Contents
2. Analyzing the application 🧐
2.1. The Refactoring Process
3. Cleaning Commit History 🧹
3.2. What is rebasing?
3.3. Rebase in practice
4. Conclusion 🎇
Analyzing the application 🧐
Before making any refactoring changes to the code, it is essential to carefully go through the entire application, review the existing technical debt and tactically plan what changes would improve the maintainability of the code, without introducing new problems to deal with.
After elaborate analysis of the existing code and where I wanted to take the application, I came up with 4 atomic changes that would independently build on top of each other without affecting the existing behavior. 🐱💻
The Refactoring Process
Following the normal git workflow, I branched out to a topic branch, before starting my work.
git checkout -b refactoring
1. Rename File
class to HtmlFile
: The File
class in my application was mainly responsible to hold the path
and content
of the HTML file to be generated, and write it to the instance path, when its generate_html_file
method was called. Since it was mainly representing a virtual instance of an html file, it made more sense to call it HtmlFile
.
2. Extract a class called CommandlineParser
: I already had all the code to parse, hold and manage commandline arguments in a separate file commandline.py
. Since they were all closely related and worked towards a single purpose, it was clear that the entire functionality belonged to a single class. Apart from encapuslating everything in a class, I also made sure to follow the singleton design pattern, so the entire app could share the same instance of the parser.
class CommandlineParser:
_instance = None
def __new__(cls):
"""Restrict the instantiation class to a single instance
:return: A global point of access to the single instance
"""
if cls._instance is None:
cls._instance = super(CommandlineParser, cls).__new__(cls)
cls._instance._initialize()
return cls._instance
def _initialize(self):
# Create an ArgumentParser object
self._parser = argparse.ArgumentParser(
prog='TIL Page Builder',
description='Converts text files for TIL posts to HTML files for publishing on the web.',
)
self._setup_arguments()
def get_args(self):
return self._cl_args
def _setup_arguments(self):
...
3. Add an App
class for entry point: As I was aiming to follow class based approach for my app, why would I leave the main file out. I noticed I was using quite a few global variables, and their use could be eliminated by encapsulating the entry point functionality as well.
class App:
def __init__(self):
self.cl_args = CommandlineParser().get_args()
self.html_builder = HtmlBuilder()
def run(self):
"""Entry point for the app"""
# Check for exit commands (Return if any branch is entered)
if self.cl_args.version:
print(f"{version.__name__}: {version.__version__}")
exit(0)
... ...
if __name__ == "__main__":
app = App()
app.run()
Sweet!
4. Turn HtmlBuilder
to a class: As the final part of the refactoring process, it was time to encapsulate the code that was doing the actual heavy lifting of the application. It was crystal clear that the entire html generation code belonged to the same class, and that's what I did. Not only did I encapsulate the functions in the class, I also split generate_html_for_file
into 2 functions to make it leaner and easier to maintain.
class HtmlBuilder:
def __init__(self):
self._cl_args = CommandlineParser().get_args()
self._output_path = self._cl_args.output # Output directory for files
self._document_lang = self._cl_args.lang # Language used for the document
def generate_html_for_file(self, file_path):
...
def generate_files(self, files_to_be_generated):
...
def generate_document(self, virtual_doc, page_title, lines, input_file_path):
...
def _process_text_block_for_markdown(self, virtual_doc, text_block):
...
def _is_title_present(self, lines):
...
def _neutralize_newline_character(self, line):
...
Throughout the entire process, I made sure to commit my changes at each step. Here's what the history looked after I was done.
Note: Ignore the last commit, as that was made to fix a typo.
Cleaning Commit History 🧹
Now that the coding part was done, it was time to put to practice whatever I had learned about git rebase
.
What is rebasing?
git rebase
and git amend
are powerful git commands that allow you to rewrite git history, as you want it to be.
The purpose of git rebase
is similar to git merge
i.e., to integrate the changes made on your current branch into the target branch.
Merging is fine unless work is only being done on your topic branch and a fast-forward
merge can be done.
But, when multiple people are working on the project, the branch your topic branch is based on might have already progressed by a certain amount of commits, by the time you are finish with your work. In this case, git performs a 3-way merge, adding an extra merge commit
to your project history. Eventually, this leads to a messy, complicated, and hard-to-read commit history that you might don't like.
git rebase provides a solution to this problem by applying all the commits on your current branch at the HEAD of the branch you're trying to integrate your code into, one commit at a time. This makes the commit history appears as if you directly commited at the HEAD
of the target branch, in other words, a linear commit history.
Rebase in practice
Now that we know what a rebase does, I'd like to share how I used it to integrate my refactoring changes the master
branch of my project.
The first thing I did was run an interactive
git rebase while on my refactoring
branch.
git rebase master -i
It opened an editor that allowed me to dictate how I wanted those commits to apply.
I decided to squash all the commits into the first commit I made after branching off:
Once I saved and exited, it took me to a final review where I could edit the commit message for my squashed commit.
I let it stay the default, which was the combination of all 5 commit messages. After I saved and exited, the rebase was successful.
Let's see what the log looks like:
git log
Now I could have changes the commit message while I was rebasing, but I decided to practice git amend
command for that.
git commit --amend
This opened a shell that allowed me to change the commit message of the last commit on my branch, which was the squashed
commit in this case.
I entered the new name
and Voila, that's what the log looked like after successful amend.
Notice that its just 1
extra commit after the HEAD
of master, this would allow us to do a fast-forward merge.
Finally, lets look at our master
branch on Github.
BEHOLD THE RESULTS!
You might want to take a look at the graph as well.
git log --graph
Its like magic isn't it! We went through so much stuff, and still the commit history looks super clean. That's the power of rebase.
Conclusion 🎇
Did you see how cool this command is! It was hard for to believe such a functionality exists. I hope this blog motivates you to make use of git rebase
even more to make your lives easier.
One piece of advice, its important to be careful when playing with commit history, as these are one of the very few commands in git that can make you lose your work!
Top comments (0)