This week was my first attempt at contributing to an open source project for which I had no original contribution. It was an interesting and insightful venture. Being a Python based text to html converter, I had recently endeavoured to create a similar program. As such, it was very helpful to review another approach to the same problem. In reviewing this project I was able to see a great model of modular Python programming with a more complex file structure that subdivided utilities, helper functions and various other groups of related functions. This is something I have a lot of experience with in C++, Java and JS, but haven’t delved too deep into as I’ve been learning Python. In addition, it learned about some great modules that I can access in Python that I was previously unfamiliar with. For example, the way the author utilized the Yattag module to build a virtual HTML document in the conversion process was very interesting.
Moreover, this was a great exercise in forking and cloning an outside repository in order to edit and suggest changes to the original program. initially I created an Issue in the original repository. After some conversation with the author and some helpful guidance as to how to approach their program file structure and overall coding style, I went ahead and forked the repo to my GitHub and then cloned it to my local machine to work with it.
My contribution to their code was minimal, but hopefully effective (still in conversation to finalize changes at the time of writing this). My goal was to expand on the existing program to add markdown formatting and .md file recognition to the existing .txt conversion functionality. First off was just a basic addition to the input file and directory verification process. This essentially meant simply providing another condition that identifies .md files in addition to the existing .txt file recognition. The more complex challenge was then to add to the existing text parsing functionality in order to identify markdown formatting and convert it to HTML formatting. My initial attempt to tackle this was to use a line by line regEx replacement tool to find markdown bold and italics formatting ('*', '**' and '__'
) and replace it with <strong>
and <em>
html tags. While this worked in theory, my initial Pull Request brought to my attention that this method was not compatible with Yattag module’s method of implementing HTML tags into the virtual document. As I write this, I am working with the author to troubleshoot this issue and am diving deeper into the Yattag module details to address this issue and provide a more elegant solution to replace markdown formatting tags to HTML tags.
UPDATE 07/28/2023: Working with the author I learned about some bugs in using RegEx replace with the Yattag Doc.text() function in order to format and add text content to the virtual document being built. I learned that when introducing special characters such as '<' or '>' it is necessary to instead use doc.asis() in order to add an "as is" representation of the character to the virtual document. With out this the angle brackets enclosing the HTML tags were appearing as '<' and '>' in the converted HTML doc without applying the desired format of the enclosed tag.
Top comments (0)