DEV Community

Cover image for How to extract title, description or metadata from markdown
Ayobami Ogundiran
Ayobami Ogundiran

Posted on

How to extract title, description or metadata from markdown

Do you need a piece of code to extract title, description or frontmatter from markdwon dynamically or you are just curious to know how it is done?

This tutorial shows you how to do it efficiently and step by step.

Just give me the code:

    const extractMetadataFromMarkdown = (markdown) => {
        const charactersBetweenGroupedHyphens = /^---([\s\S]*?)---/;
        const metadataMatched = markdown.match(charactersBetweenGroupedHyphens);
        const metadata = metadataMatched[1];

        if (!metadata) {
          return {};
        }

        const metadataLines = metadata.split("\n");
        const metadataObject = metadataLines.reduce((accumulator, line) => {
          const [key, ...value] = line.split(":").map((part) => part.trim());

          if (key)
            accumulator[key] = value[1] ? value.join(":") : value.join("");
          return accumulator;
        }, {});

        return metadataObject;
   };
Enter fullscreen mode Exit fullscreen mode

Now, let's explain everything step by step.

Step 1: Declare a function named "extractMarkdownMetadata"

const extractMarkdownMetadata = markdown => {
  // The rest of the code will go here  
};
Enter fullscreen mode Exit fullscreen mode

extractMarkdownMetadata takes a markdwon as an argument. Let's assume the markdown we want to use is:


`---
title: how to get things done
description: this is greate
tags: money, income, coding
conver_image: ayobami.jpg
---

This is the main body of the article.`
Enter fullscreen mode Exit fullscreen mode

Step 2: Write a regex that matches anything within --- and ---

const charactersBetweenGroupedHyphens = /^---([\s\S]*?)---/;
Enter fullscreen mode Exit fullscreen mode

Clearly, you get the purpose of the regex above but do you understand what each of its units does? Let me explain:

 /: indicates we start writing a regex 
 ^: means the matching only matches the beginning of a string
 ---: matches three hyphens
 \s : matches whitespace characters (enter, tab and more)
 \S : matches non-whitespace characters (texts, numbers and symbols)
 [\s\S]: it matches a white or non-white space character
 *: matches the preceding element zero or more times, in this case, it operates on [\s\S],
 ?: matches the preceding element zero or one time. So "*?" makes the matching lazy. 
 ([\s\S]*?): () is a group capturing that remembers/keeps the string in the bracket as a group. 
 ---: matches three ending hyphens
 /: indicates the end of the regex
Enter fullscreen mode Exit fullscreen mode

Step 3: Extract frontmeta or metadata from a string

 const metadataMatched = markdown.match(charactersBetweenGroupedHyphens);
const metadata = metadataMatched[1];
Enter fullscreen mode Exit fullscreen mode

Don't forget, markdwon is a string of markdown passed to the function as an argument and now, we extract the metadata from it. If we console.log metadata we should have a string that looks like below:

"title: how to get things done
description: this is greate
tags: money, income, coding
conver_image: ayobami.jpg"  
Enter fullscreen mode Exit fullscreen mode

You might want to ask, why do we assess metadataMatched[1] with 1 and why not 0 or any other number?

It is because the regex matched the string including --- and --- and it is the first element of the array but group capturing helped pick the text between ( and ) as the second element of metadataMatched. So, we used 1 to access it.

Step 4: Split the string of metadata into lines of an array

 if (!metadata) {
   return {};
 }
 const metadataLines = metadata.split("\n");
Enter fullscreen mode Exit fullscreen mode

We return an empty object if metadata is falsy and split the metadata string into an array of lines of strings.

Step 5: Convert the lines into an object

After we split the metadata into an array of lines of strings, the metadataLines should look like below:

[
    "title: how to get things done",
    "description: this is greate",
    "tags: money, income, coding",
    "conver_image: ayobami.jpg"
]
Enter fullscreen mode Exit fullscreen mode

Now, let's convert everything into an object.

 // Use reduce to accumulate the metadata into an object
 const metadata = metadataLines.reduce((accumulator, line) => {
   // Split the line into key-value pairs
 const [key, ...value] = line.split(":").map(part => part.trim());

   if(key) {
    accumulator[key] = value[1] ? value.join(":") : value.join("");
   }    
   return accumulator;
 }, {});
Enter fullscreen mode Exit fullscreen mode

Yeah, that is what the reduce function does.

const [key, ...value] = line.split(":").map(part => part.trim());
Enter fullscreen mode Exit fullscreen mode

This part split each line by semi-colon (:). You should have realized that value is an array because of the rest operator (...). We do it that way in case semi-colon is also used as a part of value in the key-value pairs like " title: How to get things done: the best ways".

In this case, the only string before the first semi-colon is consider to be the key while the remainder is considered to be a value.

if(key) {
    accumulator[key] = value[1] ? value.join(":") : value.join("");
}    
return accumulator;
Enter fullscreen mode Exit fullscreen mode

Then, we convert the key and value to key-value pairs and put them into accumulator. Remember, accumulator is an argument from the reduce callback function.

    value[1] ? value.join(":") : value.join("")
Enter fullscreen mode Exit fullscreen mode

We checked if value is an array with more than one elment, if that is true, the array elements are join with ":" and if it has an element; we turn it to a string.

If you look at the complete function, you should see something that looks like below:

return accumulator;
}, {});
Enter fullscreen mode Exit fullscreen mode

We pass an empty object as the accumulator to the reduce method because accumulator is an array by default.

Now, the final result should look like:

{
    "title": "how to get things done",
    description: "this is greate",
    tags: "money, income, coding",
    conver_image: "ayobami.jpg"
}
Enter fullscreen mode Exit fullscreen mode

Finally, we can now extract frontmatter or metadata from any markdown with JavaScript. What are you waiting for? Go and use whatever you have learnt now.

See you soon!

One more thing

Do you want to solve any business problems with content or programming? Let's talk. Feel free to reach me on Twitter at Ayobami Ogundiran

Top comments (0)