Roy J. Wignarajah

Posted on Sep 23, 2023 • Edited on May 6, 2024

Markdown, First Contributions, and Pull Requests

#osd600

Links Mentioned

Project I worked on: https://github.com/mingming-ma/txt2html
Submitted issue: https://github.com/mingming-ma/txt2html/issues/6
Submitted pull request: https://github.com/mingming-ma/txt2html/pull/8
My project: https://github.com/rjwignar/ctil

Lab Exercise 2

In my Open Source Development class I did another Lab Exercise that involved me working on a classmate's project. The purpose of this exercise was to give us practice on things such as:

forking and cloning projects
making and working with branches to work on new features/fix bugs
contributing to code we didn't write
making pull requests
working with other developers on GitHub

Markdown

In my initial release of ctil, I added basic plain-text to HTML conversions. Markdown is a markup language used to provide document formatting to plain text files. Markdown files have the extension .md, and although it's simpler than HTML, it supports many of its features, such as italics, bolding, and lists. I've worked with Markdown in the past, but not extensively. During the semester I will constantly refer to a Markdown quick reference sheet found here

In this lab exercise, I was tasked with adding two Markdown related features to a classmate's Markdown-to-HTML implementation:

Support to convert Markdown (.md) files to HTML
If converting Markdown (.md) files, conversion of one Markdown feature to its respective HTML equivalent.

My First Open Source Contribution

For this lab exercise, I worked on mingming-ma's txt2html. txt2html is written in Python, and most of the program logic was compartmentalized into different functions. The logic for checking for plain txt .txt file existed in a method called process_folder, so to add Markdown support I extended process_folder to also check the input folder for Markdown .md files. The Markdown conversion feature I added was conversions of italics to HTML. These conversions would allow the program to convert strings like *word* or _word_ to word.

The Contribution Process

One important thing I learned from this exercise was the Open Source contribution process on GitHub. The first step of this process was to file an issue on our chosen project and to request to personally work on this issue. I don't know if this is standard for most projects but it sounds like a good practice. Although I imagine some projects may have their own preferred practices for adding contributions.
The point of filing an issue here is to describe a bug or desired feature, and to describe what you plan on doing. If the author assigns the issue to us, we can be sure they'll consider our contributions and can begin working on the issue ourselves.

Forks, Clones, and Branches

The next step in the process was to start implementing the Markdown features in the chosen project. To do this, I had to create my own fork of txt2html, clone that fork to my local machine, and do my work on a new branch. In this class I was taught that, when working on a new issue or feature, it's a good practice to work not on the main branch, but instead to work on a new, appropriately named branch. I can think of a couple reasons this is important:

Multiple people can work on different features/issue simultaneously by using different branches from the main branch
It's best not to commit in-progress features to the main branch in case the feature design changes or the feature itself is scrapped

Once the issue or feature has been fully worked on, its branch can merged with the main branch.

Working on code that isn't yours

Working on txt2html was an interesting experience. txt2html is written in Python, a language I don't have much experience writing in. Python syntax and methods were easy to be accustomed to, but a few times I caught myself adding semi-colons (;) at the end of each line I added.

Contributing to txt2html provided good practice for working on code that isn't mine. I've been taught the following key points when contributing to code that isn't mine:

Don't rewrite their code in your style, add new code in their style
Change as little original code as possible
- Don't touch code unrelated to your changes
- Don't fix bugs unrelated to your current work
- Unrelated bugs require their Contribution Process
Write as little code as possible to get the feature working
Ensure your changes don't break the original code before committing
Stage your work in small commits that tell a story

Following the above points helps you write new code as close to the style of the original author(s), which is required as it provides current and future contributors a shared set of design principles to follow (naming conventions, formatting).
The best way to do this is reading the existing code, understand how it's organized in terms of files, classes, and functions, and ensuring you can run the program. In txt2html, I noticed the entire program was written in one Python file, with the logic separated into various functions. This informed how I made my additions, as I put most of my new code in new functions, and modified some original code to invoke these new functions. I was able to commit my changes in small steps, which made it easier to figure out what to work on next to get my features working.

Another thing I learned was that open source is a team sport. If I don't understand a part of code, it's encouraged to reach out for clarification. As the txt2html author is my classmate, I was able to easily reach out to him on our class' Slack group. However, I imagine this might be more difficult on other projects, where a Slack group is not always available.

Pull Requests

When a feature has been implemented or an issue fixed for an Open Source project, the fork has to be published on GitHub, and a contributor must submit a Pull Request to the forked repository. A Pull Request is a request to the project author(s) to pull and merge a contributor's changes into the main fork of their project. The Pull Request I made to txt2html is available here. I learned that the Pull Request should be detailed with the Issue addressed, changes made, why some choices were made, and any problems/bugs encountered. The point is to give the project owner(s) the information they need to understand why and what will be changed if the Pull Request is accepted.

I was able to make a Pull Request with txt2html. However, I noticed one bug with my work. The syntax for Markdown bolding is similar to the Markdown italics syntax (__word__ vs _word_). Since bolding conversion support hasn't been implemented yet, __word__ will be converted to _word_ instead of word. One solution I suggested was to implement Markdown bolding support and checking words for bolding syntax before checking for italics syntax. After discussing with the author, we agreed that I would add To-Do comments to their program on how to do this.

One thing I also noticed was my current implementation breaks each file line into words, and then applies the Italics conversion. I initially did this because I thought applying the conversion to an entire line of text would be difficult. However, one consequence of this is that multi-word italics would be ignored, since phrases like *this phrase* would be split into *this and phrase*, and would not be converted. I would like to modify my additions to enable multi-word italics detection in the future, once I become more comfortable doing RegEx substitutions on entire lines of text.

Reviewing Pull Requests

I am happy to share that mingming-ma was able to contribute initial Markdown support to ctil and submit a Pull Request.
Although submitting my own Pull Request involved a lot of work, I quickly learned that receiving a Pull Request can be just as intensive, since proposed changes must be reviewed before being accepted. When I first reviewed the Pull Request, I had a hard time understanding the changes that took place, at least until I reviewed each commit (another reason to stage small commits!). After my review, I noticed underscore italics conversion support (to detect and convert underscore italics syntax such as _word_) was missing. After consulting with mingming-ma on Slack, he was able to add this feature, and I approved the Pull Request.

DEV Community