Amnish Singh Arora

Posted on Oct 30, 2023

Docusaurus Takeaways in Action

#opensource #python #learning #design

In my last blog post, I discussed about docusaurus, how I learnt to use it, and what I learnt from the source code.

In this post, I'll be discussing about how I took whatever I learnt from the project, and put it in practice by adding a brand new feature to my own project til-page-builder inspired from docusaurus.

I talked in detail about what is the feature we're going to implement and how docusaurus inspired me to add it in my last blog post. Make sure to read that before continuing.

1. Filing Issues 📁
2. Working on Issues
       2.1. Adding necessary classes
       2.2. Adding a basic TOC implementation
       2.3. Adding unique ids to all headings while parsing
3. Filing follow up issues
4. Conclusion 🎇

Filing Issues 📁

The first step whenever implementing something new should be to file issues on Github or any other platform you prefer. Issues not only document your work, but also help you in planning the design for your code.

Whenever you file an issue, you're essentially rethinking your approach and many times, you'll end up changing it to something else. I also went through multiple iterations of revising my plan and ended up with the following issues.

Working on Issues

Now that I had my issues in place and plan in mind, it was time to build something.

Adding necessary classes

I started with my first issue planning out my HeadingItem class. I first looked at the docusaurus code for inspiration of what properties I should be having.

I noticed id, value, and children which would be sufficient for my purpose. I also added a static function to generate hashed ids based on a heading's text content, an html generation function and ended up with this.

src\builder\toc_generator\heading_item.py

from typing import List
import hashlib

class HeadingItem:
    def __init__(self, value: str, children: List['HeadingItem'] = []):
        """Constructor

        :param id: id of the corresponding html element
        :param value: inner html of the corresponding html element
        :param children: child nodes
        """
        self.id = HeadingItem.generate_heading_id(value)
        self.value = value
        self.children = children

    @staticmethod
    def generate_heading_id(text_content: str):
        """Generate a unique hash id based on the heading text

        :param text_content: Text content of the heading
        :return: A unique `sha512` id for the heading
        """
        # Create a SHA-512 hash object
        sha512_hash = hashlib.sha512()

        # Update the hash object with the bytes of the string
        sha512_hash.update(text_content.encode())

        # Get the hexadecimal representation of the hash
        return sha512_hash.hexdigest()

    def get_html(self) -> str:
        return f"""
            <li>
                <a href='#{self.id}'>{self.value}</a>
            </li>
            """

Looks like a mini React component, right. I also realized after finishing the code.

And finally, I needed a TOC classs to make use of our HeadingItem and generate required TOC html.

src\builder\toc_generator\index.py

from typing import List
from builder.toc_generator.heading_item import HeadingItem

class TOC:
    def __init__(self, items: List['HeadingItem'] = []):
        self.items = items

    def add_item(self, item: HeadingItem):
        self.items.append(item)

    def get_html(self):
        return f"""
            <h2>Table of Contents</h2>
            <ul>
                {''.join(list(map(lambda item: item.get_html(), self.items)))}
            </ul>
            """

    def clear(self):
        self.items = []

This fixes #13 😉

Adding a basic TOC implementation

Now it was time to add the actual implementation of an initial TOC prototype that would add some visual progress as well.

For this I had to make additions to my already existing HtmlBuilder class.

src\builder\html_builder.py

from builder.toc_generator.index import TOC, HeadingItem

class HtmlBuilder:
    TOC_PLACEHOLDER = '<div id="toc-placeholder"></div>'

    def __init__(self):
        self._cl_args = CommandlineParser().get_args()

        self._output_path = self._cl_args.output # Output directory for files
        self._document_lang = self._cl_args.lang # Language used for the document

        # TOC Object
        self.toc = TOC()


    def generate_html_for_file(self, file_path):
        # Start with a clean slate
        self.reset()
        ... ...
   def reset(self):
        self.toc.clear()


   def generate_document(self, virtual_doc, page_title, lines, input_file_path):
        doc, tag, text = virtual_doc
        line_cursor_position = 0
        ... ...
        ### Adding a TOC placeholder to the body
        with tag('body'):
                # Add an h1 if the title is present
                if has_txt_extension(input_file_path) and self._is_title_present(lines):
                    with tag('h1'):
                        text(self._neutralize_newline_character(page_title))
                        line_cursor_position += 3

                # Placeholder for TOC
                with tag('div', id='toc-placeholder'):
                    doc.asis()

After having a TOC placeholder in place, it was time to make changes to _process_text_block_for_markdown function.

elif line_queries.is_h2(text_block):
            formatted_block = text_block.replace(line_queries.H2_TOKEN, "")

            with tag('h2'):
                doc.asis(formatted_block)
                self.toc.add_item(HeadingItem(formatted_block))
        elif line_queries.is_h3(text_block):
            formatted_block = text_block.replace(line_queries.H3_TOKEN, "")

            with tag('h3'):
                doc.asis(formatted_block)
                self.toc.add_item(HeadingItem(formatted_block))

With this, we now have all the HeadingItems stored in our TOC object.

As the final step, I had generate TOC html and replace it with the placeholder we added before.

file_content = doc.getvalue() # Get the generated html string
file_content = file_content.replace(HtmlBuilder.TOC_PLACEHOLDER, self.toc.get_html()) # Replace TOC placeholder with actual TOC
file_content = indentation.indent(file_content) # Indent the html

gen_file_path = f"{self._output_path}/{os.path.basename(file_path)}"

        return HtmlFile(gen_file_path, file_content)

And now we could see our TOC at the top of every generated page.

This fixes #14.

But I was still missing one last thing, that was to make TOC links functional.

Adding unique ids to all headings while parsing

Even though the links we generated do have appropriate ids in their href, we need to add those ids to the actual headings as well.

And that is the reason why I added a static method to generate hashed ids for a piece of text. The reason I chose a hashing algorithm was that it was easy to plug in, generates unique ids and provides same result on same input, perfect for our use case.

I only needed the following minor change to fix this.

elif line_queries.is_h2(text_block):
     formatted_block = text_block.replace(line_queries.H2_TOKEN, "")

      with tag('h2', id=HeadingItem.generate_heading_id(formatted_block)):
          doc.asis(formatted_block)
          self.toc.add_item(HeadingItem(formatted_block))

Alright now that we have correct ids on each heading plugged in with the links in TOC, its time to see it in action.

And this fixes #15.

Time to merge to master!

Filing follow up issues

Since there was still a lot to do, I filed the following follow up issues.

Implement nesting of links and children: The current implementation just renders the links as a linear list, I need to implement it as a tree data structure such that proper indentations can be added.
ID Generation: I need to simplify id generation as sha512 is clearly an overkill.
Add a TOC token: Need to add support for a TOC token, that the user can place anywhere in their markdown for Table of Contents generation.

Conclusion 🎇

This is how I built an initial prototype for my new feature of TOC generation. I'll start working on the follow up issues and continue filing more, until I can call it complete.

Stay tuned for next post!

DEV Community

Docusaurus Takeaways in Action

Table of Contents

Filing Issues 📁

Working on Issues

Adding necessary classes

Adding a basic TOC implementation

Adding unique ids to all headings while parsing

Filing follow up issues

Conclusion 🎇

Top comments (0)

Read next

How to Create Your Own RAG with Free LLM Models and a Knowledge Base

AI Pronunciation Trainer

Open-Source CRM for Small Businesses and Startups

AI Pronunciation Trainer