Ayobami Ogundiran

Posted on Mar 2, 2023

How to extract title, description or metadata from markdown

#frontmatter #tutorial #markdown #javascript

Do you need a piece of code to extract title, description or frontmatter from markdwon dynamically or you are just curious to know how it is done?

This tutorial shows you how to do it efficiently and step by step.

Just give me the code:

    const extractMetadataFromMarkdown = (markdown) => {
        const charactersBetweenGroupedHyphens = /^---([\s\S]*?)---/;
        const metadataMatched = markdown.match(charactersBetweenGroupedHyphens);
        const metadata = metadataMatched[1];

        if (!metadata) {
          return {};
        }

        const metadataLines = metadata.split("\n");
        const metadataObject = metadataLines.reduce((accumulator, line) => {
          const [key, ...value] = line.split(":").map((part) => part.trim());

          if (key)
            accumulator[key] = value[1] ? value.join(":") : value.join("");
          return accumulator;
        }, {});

        return metadataObject;
   };

Now, let's explain everything step by step.

Step 1: Declare a function named "extractMarkdownMetadata"

const extractMarkdownMetadata = markdown => {
  // The rest of the code will go here  
};

extractMarkdownMetadata takes a markdwon as an argument. Let's assume the markdown we want to use is:


`---
title: how to get things done
description: this is greate
tags: money, income, coding
conver_image: ayobami.jpg
---

This is the main body of the article.`

Step 2: Write a regex that matches anything within `---` and `---`

const charactersBetweenGroupedHyphens = /^---([\s\S]*?)---/;

Clearly, you get the purpose of the regex above but do you understand what each of its units does? Let me explain:

 /: indicates we start writing a regex 
 ^: means the matching only matches the beginning of a string
 ---: matches three hyphens
 \s : matches whitespace characters (enter, tab and more)
 \S : matches non-whitespace characters (texts, numbers and symbols)
 [\s\S]: it matches a white or non-white space character
 *: matches the preceding element zero or more times, in this case, it operates on [\s\S],
 ?: matches the preceding element zero or one time. So "*?" makes the matching lazy. 
 ([\s\S]*?): () is a group capturing that remembers/keeps the string in the bracket as a group. 
 ---: matches three ending hyphens
 /: indicates the end of the regex

Step 3: Extract frontmeta or metadata from a string

 const metadataMatched = markdown.match(charactersBetweenGroupedHyphens);
const metadata = metadataMatched[1];

Don't forget, markdwon is a string of markdown passed to the function as an argument and now, we extract the metadata from it. If we console.log metadata we should have a string that looks like below:

"title: how to get things done
description: this is greate
tags: money, income, coding
conver_image: ayobami.jpg"

You might want to ask, why do we assess metadataMatched[1] with 1 and why not 0 or any other number?

It is because the regex matched the string including --- and --- and it is the first element of the array but group capturing helped pick the text between ( and ) as the second element of metadataMatched. So, we used 1 to access it.

Step 4: Split the string of metadata into lines of an array

 if (!metadata) {
   return {};
 }
 const metadataLines = metadata.split("\n");

We return an empty object if metadata is falsy and split the metadata string into an array of lines of strings.

Step 5: Convert the lines into an object

After we split the metadata into an array of lines of strings, the metadataLines should look like below:

[
    "title: how to get things done",
    "description: this is greate",
    "tags: money, income, coding",
    "conver_image: ayobami.jpg"
]

Now, let's convert everything into an object.

 // Use reduce to accumulate the metadata into an object
 const metadata = metadataLines.reduce((accumulator, line) => {
   // Split the line into key-value pairs
 const [key, ...value] = line.split(":").map(part => part.trim());

   if(key) {
    accumulator[key] = value[1] ? value.join(":") : value.join("");
   }    
   return accumulator;
 }, {});

Yeah, that is what the reduce function does.

const [key, ...value] = line.split(":").map(part => part.trim());

This part split each line by semi-colon (:). You should have realized that value is an array because of the rest operator (...). We do it that way in case semi-colon is also used as a part of value in the key-value pairs like " title: How to get things done: the best ways".

In this case, the only string before the first semi-colon is consider to be the key while the remainder is considered to be a value.

if(key) {
    accumulator[key] = value[1] ? value.join(":") : value.join("");
}    
return accumulator;

Then, we convert the key and value to key-value pairs and put them into accumulator. Remember, accumulator is an argument from the reduce callback function.

    value[1] ? value.join(":") : value.join("")

We checked if value is an array with more than one elment, if that is true, the array elements are join with ":" and if it has an element; we turn it to a string.

If you look at the complete function, you should see something that looks like below:

return accumulator;
}, {});

We pass an empty object as the accumulator to the reduce method because accumulator is an array by default.

Now, the final result should look like:

{
    "title": "how to get things done",
    description: "this is greate",
    tags: "money, income, coding",
    conver_image: "ayobami.jpg"
}

Finally, we can now extract frontmatter or metadata from any markdown with JavaScript. What are you waiting for? Go and use whatever you have learnt now.

See you soon!

One more thing

Do you want to solve any business problems with content or programming? Let's talk. Feel free to reach me on Twitter at Ayobami Ogundiran

DEV Community

How to extract title, description or metadata from markdown

Step 1: Declare a function named "extractMarkdownMetadata"

Step 2: Write a regex that matches anything within `---` and `---`

Step 3: Extract frontmeta or metadata from a string

Step 4: Split the string of metadata into lines of an array

Step 5: Convert the lines into an object

One more thing

Latest comments (0)

Read next

10 Fun Browser Games to Play When You’re Bored

Next.js App Router Course

Explore useEffect Alternatives: Beyond the Well-Trodden Paths

Difference between Func() , ()=>Func() , {Func} , ()=>Func in JS

Step 1: Declare a function named "extractMarkdownMetadata"

Step 2: Write a regex that matches anything within --- and ---

Step 3: Extract frontmeta or metadata from a string

Step 4: Split the string of metadata into lines of an array

Step 5: Convert the lines into an object

One more thing

Read next

10 Fun Browser Games to Play When You’re Bored

Next.js App Router Course

Explore useEffect Alternatives: Beyond the Well-Trodden Paths

Difference between Func() , ()=>Func() , {Func} , ()=>Func in JS

Step 2: Write a regex that matches anything within `---` and `---`