Roy J. Wignarajah

Posted on Nov 1, 2023 • Edited on May 6, 2024

Code Reading

#opensource #beginners #programming #learning

Lab Exercise - Code Reading Docusaurus

In my Open Source Class, I practiced reading code in a large open source project, or code reading. Docusaurus is Facebook's open source React-based, static site generation, and was used by our class to practice code reading.
Docusaurus helps projects document their program to developers and users without requiring the effort required to create and deploy a full website. Many projects use Docusaurus for their documentation, such as Jest and React Native.

Large Projects

The project is written mostly in TypeScript and is large, which might seem intimidating to read. However, the point of this exercise was to, through first-hand experience, learn that larger code isn't necessarily harder to work with and understand, as long as you know what you're looking for and how to look for it. Docusaurus also has many cool features, and because accepting Docusaurus' License allows people to "use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the software", we can also take inspiration from Docusaurus to add features to our own projects.

Feature Research - Syntax Highlighting

For this lab exericse, the feature I chose to research was syntax highlighting for code blocks. Syntax highlighting is a common feature used in many developer tools that enhances code readability. People who have used Visual Studio Code, Visual Studio, or even a text editor such as SublimeText have seen syntax highlighting in action. Outside of text editors and IDEs, syntax highlighting can be used to enhance the readability of fenced code blocks, which are often used in program documentation and educational websites like w3schools.com.

I want to implement Syntax Highlighting

This feature interests me because I've often wondered how it's implemented. My Markdown-to-HTML converter, ctil, supports fenced code block conversions, and I want to add this feature to my project. ctil is written in C++, which means the strategies used in Docusaurus may differ from what I'll have to implement. However, researching how syntax highlighting is done is Docusaurus should still give me some ideas on how to implement the feature in my own project.

The Code Reading Process

Task - Locating Feature Information

For this lab exercise, I've been tasked with using Code Reading to locate the following code related to my feature:

implementation
tests
build/configuration settings
docs

Strategies for Locating Code

In my Open Source class, I was taught a few strategies we can use to help us read code related to a feature of interest:

git grep - after cloning a repo, you can use git grep to search for keywords in the tracked files and folders.
GitHub Code Search - GitHub has many advanced search features built in the UI that can help navigate files and folders
AI - Large Language Models (LLMs) like ChatGPT can help us understand what existing code is doing. You can paste in the code you're reading and request a detailed explanation of unfamiliar syntax or how the code is implemented.

For this lab exercise I only used git grep, but the other strategies should also be considered if git grep is not providing much insight.

Don't like Source Code like a book!

A quick glance at the Docusaurus project on GitHub should tell you this is a large project. This is, no doubt, intimidating. The one thing we must understand is that source code is not organized like a book. To be good at reading source code, we must respect that fact. Rather than reading a code base starting at main(), we must instead have a goal and work from there.

What I did

Since Docusaurus is often used to prepare documentation, I figured the Docusaurus website/documentation would be a good resource to start with. It turns out the Docusaurus website itself is built using Docusaurus, which made it easy to navigate to this page on Code Blocks.

Skimming this page and the other documentation taught me a few important things:

Docusaurus uses Markdown as its main authoring format
Docusaurus uses the MDX compiler to transform Markdown files to React components, allowing the use of JSX in Markdown content.
Docusaurus uses the Prism React Renderer to provide syntax highlighting to code blocks, specifically the Palenight syntax highlighting theme.

What is MDX?

This is the first time I've heard of MDX, which (as I understand) turned out to be a superset of Markdown with JSX support. This means I would have to look out for .md and/or .mdx files to see examples of syntax highlighting in action. More importantly, these pages taught me that Docusaurus' syntax highlighting is done by a third-party library.

Don't Create what you can Re-Use

One lesson I've learned in my Open Source class is that we shouldn't create what we can re-use. One reason Open Source projects exist is so that people don't have to rewrite the same code over and over. Instead, it's often better to use and contribute to an external library as long as it's maintained and does what we need. Learning that Docusaurus uses an external library for syntax library was a good reminder, as this means I can employ a similar strategy and find a third-party library to implement syntax highlighting in my own project. This would relieve me of the burden of implementing my own syntax highlighting library.

Code Reading Docusaurus

Although Docusaurus uses an external library for syntax highlighting, it is still beneficial to read the codebase and see how Prism is used in the program. After cloning the Docusaurus project from GitHub, I was able to run git grep Prism and git grep prism to find many uses of Prism in the Docusaurus project:

Just using git grep provided a wealth of information. The list of files where the keywords "Prism" and "prism" are used showed me their usage in files such as:

package.json (to add prism as a dependency):

  "dependencies": {
    "@docusaurus/core": "2.4.3",
    "@docusaurus/preset-classic": "2.4.3",
    "@mdx-js/react": "^1.6.22",
    "clsx": "^1.2.1",
    "prism-react-renderer": "^1.3.5",
    "react": "^17.0.2",
    "react-dom": "^17.0.2"
  },

yarn.lock (to ensure consistent package installations)

prism-react-renderer@^1.3.5:
  version "1.3.5"
  resolved "https://registry.yarnpkg.com/prism-react-renderer/-/prism-react-renderer-1.3.5.tgz#786bb69aa6f73c32ba1ee813fbe17a0115435085"
  integrity sha512-IJ+MSwBWKG+SM3b2SUfdrhC+gu01QkV2KmRQgREThBfSQRoufqRfxfHUxpG1WcaFjP+kojcFyO9Qqtpgt3qLCg==

docusaurus.config.js (to configure light/dark code themes):

const lightCodeTheme = require('prism-react-renderer/themes/github');
const darkCodeTheme = require('prism-react-renderer/themes/dracula');
/** @type {import('@docusaurus/types').Config} */
const config = {
    // ...other code in between
    prism: {
        theme: lightCodeTheme,
        darkTheme: darkCodeTheme,
    },
};

options.test.ts containing some unit tests involving prism:

    expect(testValidateThemeConfig(userConfig)).toEqual({
      ...DEFAULT_CONFIG,
      ...userConfig,
      prism: {
        ...userConfig.prism,
        // Modified/normalized values
        defaultLanguage: 'javascript',
        additionalLanguages: ['kotlin', 'java'],
      },
    });

String.tsx - example of <Highlight /> Component being used

        <Highlight
          theme={prismTheme}
          code={code}
          language={(language ?? 'text') as Language}>
          {({className, style, tokens, getLineProps, getTokenProps}) => (
            <pre
              /* eslint-disable-next-line jsx-a11y/no-noninteractive-tabindex */
              tabIndex={0}
              ref={wordWrap.codeBlockRef}
              className={clsx(className, styles.codeBlock, 'thin-scrollbar')}
              style={style}>
              <code
                className={clsx(
                  styles.codeBlockLines,
                  showLineNumbers && styles.codeBlockLinesWithNumbering,
                )}>
                {tokens.map((line, i) => (
                  <Line
                    key={i}
                    line={line}
                    getLineProps={getLineProps}
                    getTokenProps={getTokenProps}
                    classNames={lineClassNames[i]}
                    showLineNumbers={showLineNumbers}
                  />
                ))}
              </code>
            </pre>
          )}
        </Highlight>

Please note the snippets above do not include the full context of their respective files.

Deciphering all this code

By no means do I completely understand every line of code in the above files. However, locating and skimming these files using git grep has given me a lot of insight into how the Prism-React-Renderer library is configured, added as a dependency and for dependency control, used to highlight code, and even tested.

My project, ctil, is written in C++, so I will likely have to use a different strategy and a different third-party library to add syntax highlighting. However, even learning how Docusaurus adds syntax highlighting support has given me some ideas on how to add it to my project.

Evaluation of Research Methods

For this lab, using git grep provided me most of the information required to research my feature. However, reading the Docusaurus documentation revealed the keywords I needed to effectively use git grep.

Based on my experience, I would advise others to read any documentation available for a project before code reading for a feature. This can reveal keywords that can then be searched using git grep. If git grep cannot reveal the information you need, I think GitHub code search and even AI-based solutions can be viable alternatives.

Next Steps

For this lab exercise, I didn't use GitHub Code Search or AI, but I think I would have considered them if I couldn't use git grep effectively. I want to use GitHub Code Search in the future, as there many powerful features that can be used to support the Code Reading Process.

DEV Community