DEV Community

Wojciech Rithaler
Wojciech Rithaler

Posted on

The power of Remark

1. Introduction

Markdown is an easy-to-learn versatile markup language. It can be transformed into HTML making it ideal for content-heavy sites that can be served statically. The transformation process can be done with many libraries. This post will describe how to use a library called Remark for cases where custom behavior may be required.

This article will go over:

  • process of installing and setting up the library
  • simple transformation of Markdown into HTML
  • creating custom plugins

Note!

A basic knowledge of writing scripts in Node.js and some knowledge about TypeScript are required!

2. Concepts and technologies

Before diving into the code, some concepts should be understood.

2.1. Markdown

Markdown is a markup language ideal for creating readable and editable text files. Since its creation, it has been widely used in blogs, documentation and even web pages.

2.2. Remark

Remark is the most popular JavaScript library (at least according to the authors) for transforming Markdown into HTML. It works by turning the content of a file into an abstract syntax tree, and with the help of plugins, changing its content.

3. Setting up the project

Demo application of this post will use Typescript. The necessary dependencies are:

Additionally, for the explanation and the ability to run and manually test the script, a single Markdown file with the name example.md, will be needed at the root of the directory:

    # Hello World 

    This is an example of Markdown content. 
Enter fullscreen mode Exit fullscreen mode

Note!

These were the commands that prepared the project and installed the necessary dependencies:

  1. yarn init -y
  2. yarn add typescript remark remark-html unist-util-visit
  3. yarn add -D @types/mdast @types/node
  4. yarn tsc --init More information about the process of setting up a Typescript project can be found in the official documentation. Remark is an ESM only package, more information on how to import it in TypeScript project can be found here.

4. Remark basics

This is the content of main.ts file, the place where the main part of the script will reside.

    import * as fs from 'fs'; 
    import remarkHtml from 'remark-html'; 
    import {remark} from 'remark'; 

    const mdToHtml = async (fileContent: string): Promise<string> => { 
        const result = await remark() 
            .use(remarkHtml, {sanitize: false}) 
            .process(fileContent); 

        console.log(result.toString()); 
        return result.toString(); 
    } 

    const content = fs.readFileSync('example.md', 'utf-8'); 
    mdToHtml(content); 
Enter fullscreen mode Exit fullscreen mode

The above code is the bare minimum that is required to transform Markdown. Every plugin can be applied the same way with the use of "use" function. Some plugins like the "remarkHtml" used in the code above, will allow passing additional parameters as a configuration. You can check this list of the most popular Remark plugins for reference.

When run, the result should look like the following:

    <h1>Hello World</h1> 
    <p>This is an example of Markdown content.</p> 
Enter fullscreen mode Exit fullscreen mode

5. About creating Remark plugins

Just transforming Markdown to HTML is easy and so is applying plugins, but, sometimes community-made ones may not solve all the problems. In those situations, a custom plugin can be written quite easily.

Note!

Reading this section about the concept of a plugin before going straight into coding is suggested.

There is of course an official tutorial on creating a plugin, but there are some problems with it:

  • it shows how to create a plugin for Unified (Unified is a "parent project", a collection of compatible plugins and a way to connect all sub-projects together. Which is mostly unnecessary when dealing with Markdown.)
  • it uses and focuses on Retext (another sub-project)
  • it shows a very basic example and uses a lot of functions from packages that have to be installed separately but are not needed
  • There is no example of an actual content manipulation

Note!

Even official documentation suggests using Remark sub-project when dealing with Markdown.

6. Creating a Remark plugin and analyzing AST

Let's finally dive into the code and create a separate file "customPlugin.ts" for a new plugin at the root of the project. Its content will look as follows:

    import { visit } from 'unist-util-visit';
    import { Root } from 'mdast';

    export default function customPlugin() {
        return (tree: Root) => { 
            visit(tree, (node) => { 
                console.log(node); 
            }); 
        } 
    }
Enter fullscreen mode Exit fullscreen mode

For now, the only purpose of this plugin will be to display tree nodes. Of course, it has to be imported into the script and used with the "use" function like so:

    import customPlugin from './customPlugin.js';

    const mdToHtml = async (fileContent: string): Promise<string> => { 
        const result = await remark() 
            .use(remarkHtml, {sanitize: false}) 
            .use(customPlugin) 
            .process(fileContent); 

        console.log(result.toString()); 
        return result.toString(); 
    }
Enter fullscreen mode Exit fullscreen mode

The output should look like this:

    { 
        type: 'root', 
        children: [ 
            { 
                type: 'heading', 
                depth: 1, 
                children: [Array], 
                position: [Object] 
            }, 
            { type: 'paragraph', children: [Array], position: [Object] } 
        ], 
        position: { 
            start: { line: 1, column: 1, offset: 0 }, 
            end: { line: 2, column: 42, offset: 56 } 
        } 
    } 
    { 
        type: 'heading', 
        depth: 1, 
        children: [ { type: 'text', value: 'Hello World', position: [Object] } ], 
        position: { 
            start: { line: 1, column: 1, offset: 0 }, 
            end: { line: 1, column: 14, offset: 13 } 
        } 
    } 
    { 
        type: 'text', 
        value: 'Hello World', 
        position: { 
            start: { line: 1, column: 3, offset: 2 }, 
            end: { line: 1, column: 14, offset: 13 } 
        } 
    } 
    { 
        type: 'paragraph', 
        children: [ 
            { 
                type: 'text', 
                value: 'This is an example of Markdown content.', 
            } 
        ], 
        position: { 
            start: { line: 2, column: 1, offset: 15 }, 
            end: { line: 2, column: 42, offset: 56 } 
        } 
    } 
    { 
        type: 'text',
        value: 'This is an example of Markdown content.', 
        position: { 
            start: { line: 2, column: 1, offset: 15 }, 
            end: { line: 2, column: 42, offset: 56 } 
        } 
    } 
    <h1>Hello World</h1> 
    <p>This is an example of Markdown content.</p> 
Enter fullscreen mode Exit fullscreen mode

That is a lot for just a heading and a paragraph, but it is logical. The script will always start with the "root" that contains all of the elements in the "children" array. Then it will move to the first child - in this case a heading, however, in reality, it consists of two objects: a parent tag and the text inside. That is why after printing a heading, the third printed object is a "text" node that contains the same text as the heading. The same situation happens with a fourth object - a paragraph, and its text is displayed as a fifth object. At the end there is of course the actual HTML, printed to the console from main.ts file.

7. Changing transformation of nodes - a simple example

The base for a plugin is ready, however, for now, it only prints the nodes as an object. To change that and better explain the process, in this example a hashtag ("#") will be added to all the headings.
Changing the way Remark will transform a node is as simple as changing an object's properties.

The function visit allows visiting only certain types of nodes. It can be specified as a second parameter with "heading" string. Now only heading nodes will be visited. Because of that visit can be used again inside to visit a text node. All elements of the tree can be visited with visit.

    import { visit } from 'unist-util-visit'; 
    import { Root, Heading } from 'mdast'; 
    import { Transformer } from 'unified';

    export default function customPlugin(): Transformer {
        return (tree) => { 
            visit(tree, 'heading', (node: Heading) => { 
                visit(node, 'text', (textNode) => { 
                    textNode.value = `# ${textNode.value}`; 
                }); 
            }); 
        } 
    }
Enter fullscreen mode Exit fullscreen mode

Now this code will yield:

    <h1># Hello World</h1> 
    <p>This is an example of Markdown content.</p> 
Enter fullscreen mode Exit fullscreen mode

It is possible to visit many types of nodes. Instead of passing a string with type as a second parameter, a string array of types can be used.

8. Using Typescript

So far the examples shown were using Typescript passively for type checking, but there are situations where it can help with a lot more.

Markdown allows for adding HTML tags (example). Mixing syntax may seem unusual and is not recommended, but may occur with user-generated content. Not everyone uses Markdown daily.

Remark does not transform HTML markup but it can be done with some custom logic.

# Hello World 
This is an example of Markdown content. 

<img src="image.png" alt="HTML image"> 
Enter fullscreen mode Exit fullscreen mode

The script result will now look like this:

    <h1># Hello World</h1> 
    <p>This is an example of Markdown content.</p> 
    <img src="image.png" alt="HTML image"> 
Enter fullscreen mode Exit fullscreen mode

And this is how the image tag will look in AST:

{ 
  type: 'html', 
  value: '<img src="image.png" alt="HTML image">', 
  position: { 
    start: { line: 4, column: 1, offset: 60 }, 
    end: { line: 4, column: 39, offset: 98 } 
  } 
} 
Enter fullscreen mode Exit fullscreen mode

There is no distinction between tags. Any HTML syntax will be left untransformed and have type: 'html'.

This behaviour may cause trouble. For example when a domain has to be added to every image URL. The solution is a bit more custom logic.
First separate HTMLs with image tags from the rest by creating a Typescript interface:

interface HtmlImage extends HTML { 
    type: 'html'; 
    url: string; 
    alt: string; 
} 
Enter fullscreen mode Exit fullscreen mode

Check if value starts with <img. Type guard should be the best solution:

const isHtmlImage = (node: any): node is HtmlImage => {
    const HTML_IMG_SRC_REGEX = /src=["'](.+?)["']/i;
    const HTML_IMG_ALT_REGEX = /alt=["'](.+?)["']/i;

    if (node.type === "html" && node.value.startsWith("<img") && !node.value.includes(":")) {
        const url = HTML_IMG_SRC_REGEX.exec(node.value);
        const alt = HTML_IMG_ALT_REGEX.exec(node.value);
        node.url = url ? url[1] : "";
        node.alt = alt ? alt[1] : "";
        return true;
    } else return false;
} 
Enter fullscreen mode Exit fullscreen mode

An additional check should be made to filter out images from external resources.

Now to change image url, a simple if will be enough:

    if (isHtmlImage(node)) { 
        // node is of type HtmlImage and has keys "url" and "alt"  
        node.value = `<img src="-domain-${node.url}" alt="${node.alt}">` 
    } 
Enter fullscreen mode Exit fullscreen mode

9. Returning additional data

Sometimes users may forget to add an image or make a mistake in a path. Printing messages about all missing images may be useful:

    import * as fs from 'fs/promises'; 
    import remarkHtml from 'remark-html'; 
    import {remark} from 'remark'; 
    import customPlugin from './customPlugin.js';

    const mdToHtml = async (fileName: string): Promise<string> => { 
        const fileContent = await fs.readFileSync(fileName, 'utf-8'); 
        const result = await remark() 
            .use(remarkHtml, {sanitize: false}) 
            .use(customPlugin) 
            .process(fileContent);

        console.warn(`[WARNING] There are broken images in file ${fileName}. Image paths: ${result.data.missingPictures}`); 

        return result.toString(); 
    } 

    const fileName = "example.md"; 
    mdToHtml(fileName); 
Enter fullscreen mode Exit fullscreen mode

The custom plugin file:

    export default function customPlugin(): Transformer {
        return (tree, file: VFile) => {  
            const missingPicturesArray: string[] = []; 

            visit(tree, 'image', (node: Image) => { 
                if (!fs.existsSync(`./img/${node.url}`)) missingPicturesArray.push(`./img/${node.url}`);  
            });

            file.data['missingPictures'] = missingPicturesArray;
        } 
    }
Enter fullscreen mode Exit fullscreen mode

The Markdown file:

# Hello World
This is an example of Markdown content.

![image](image.png)
Enter fullscreen mode Exit fullscreen mode

This functionality is possible thanks to the fact that every Remark plugin returns a VFile. It’s an object containing information about provided Markdown file. The returned VFile object can be changed by adding things to its data sub-object.

Note!

Instead of providing just the content as a string in remark().process(content) a VFile object can also be used. Doing so may simplify passing parameters to plugins when dealing with more complex problems.

10. Async plugin / transformer

There are no examples of async plugin in the official documentation and almost none in articles.

A remark plugin can be asynchronous (Transformer returns Promise) but visit function is fully synchronous.

To overcome that a recursive function can be used:

import { Transformer } from 'unified';
import { HTML } from 'mdast';
import { Node, Parent } from 'unist';

import axios from 'axios';

async function asyncFunc() {
    const todo = await axios.get('https://jsonplaceholder.typicode.com/todos/1').then(result => {
        return result.data['title'];
    }).catch(error => {
        console.log(error);
    });

    return `todo response: ${todo}`;
}

async function asyncFuncTimeout() {
    const result = await new Promise((resolve) => {
        setTimeout(() => {
            resolve("setTimeout(5000) result");
        }, 5000);
    });

    return `timeout result: ${result}`;
}

function isHTML(node: Node): node is HTML {
    return node.type === "html";
}

async function mapNodes(node: Node | Parent<Node>, params: any): Promise<Node | Parent> {
    if (isHTML(node) && node.value.includes('async example')) node.value = await asyncFunc();
    if (isHTML(node) && node.value.includes('timeout example')) node.value = await asyncFuncTimeout();

    const childrenOrEmpty = (node as Parent).children || [];
    const children = await Promise.all( childrenOrEmpty?.map(childNode => mapNodes(childNode, params)) );

    return { ...node, children };
}

export default function asyncPlugin(params: any): Transformer {
    return async (tree) => await mapNodes(tree, params);
}
Enter fullscreen mode Exit fullscreen mode

In above script, instead of using the visit function, a custom mapNode function was used. It's obvious problem is an assertion as Parent that is required.

After running the plugin on Markdown file with content:

# Hello World
This is an example of Markdown content.

<p>async example</p>

<p>timeout example</p>
Enter fullscreen mode Exit fullscreen mode

The result was:

    <h1>Hello World</h1>
    <p>This is an example of Markdown content.</p>
    todo response: delectus aut autem
    timeout result: setTimeout(5000) result
Enter fullscreen mode Exit fullscreen mode

11. Summary

Remark is a great ecosystem. It makes transforming Markdowns a breeze and simplifies adding features that would require way more time and effort.

What has been presented in this article is of course just a small subset of features and solutions, but it should be more than enough for a seamless start with the technology.

I would like to thank my supervisor Tomasz Dyła for providing guidance on this article.

Thank you very much for reading and let me know in the comments if you have any suggestions.

Top comments (0)