RitzaCo for Ritza

Posted on Mar 26, 2021 • Edited on Jun 5, 2021 • Originally published at docs.replit.com

Create a static site generator with Python and Replit

#python #tutorial #replit

A static site generator (SSG) is a tool for building informational websites such as blogs and documentation repositories. SSGs allow technical users to build websites that are faster and more secure than ones running on dynamic platforms such as Wordpress, without having to write each HTML page.

There are many SSGs out there already, such as Jekyll and Hugo, but many people opt to write their own – either so that they fully understand it and can be more productive, or to meet custom needs.

After this tutorial, you'll:

Be able to build a simple but flexible SSG in Python in under 100 lines of code.
Understand advanced file and directory handling.
Know how to build a configurable tool for technical users.

At the end, you'll have a full SSG that you can use as is or extend for your own requirements.

Building a proof of concept

A basic SSG takes in a list of Markdown files, converts them to HTML, and inserts them into different predefined HTML templates. Beyond that, most SSGs have the concept of frontmatter to define metadata such as title and publish date for each Markdown file. SSGs also usually have global configuration files, containing general information about the site such as its name and domain.

Before we start dealing with files, we're going to implement our SSG using strings. This will serve as an initial proof of concept.

Setting up and defining the flow

We'll start by defining the main functions we'll use. Create a new Python repl and enter the following code in main.py.

def load_config():
    pass

def load_content_items():
    pass

def load_templates():
    pass

def render_site(config, content, templates):
    pass

def main():
    config = load_config()
    content = load_content_items()
    templates = load_templates()
    render_site(config, content, templates)

main()

This skeleton defines the program flow:

Load the global site configuration.
Load the content files containing Markdown and frontmatter.
Load the HTML templates.
Render the site using everything we've loaded above.

Throughout this tutorial, we will keep to this flow, even as we expand and refine its individual elements.

Parsing content and templates

Now we need to import some modules. At the top of main.py, enter the following line.

import markdown, jinja2, toml, re

All four of these modules are essentially parsers:

markdown: This module will render Markdown.
jinja2: The Jinja templating language, which we will use to create HTML templates that we can enhance with Python-esque code.
toml: We will use TOML (Tom's Obvious, Minimal Language) for post frontmatter and global configuration.
re: We'll use Python's regular expressions (regex) module for some additional, very light, parsing not provided by the three packages above.

Now that we have our parsers, let's add some content to parse. Add a TOML string for global site configuration at the top of the main function.

def main():
    config_string = """
    title = "My blog"
    """

For now, this just defines the title of our site. Change it to whatever you want. To load this config, we'll use toml.loads on its content. Go to the load_config function at the top of main.py and give it the following parameter and content.

def load_config(config_string):
    return toml.loads(config_string)

To use this function, go back to the main function and pass config_string to this line in the main function.

    config = load_config(config_string)

Now let's create a couple of content strings below the config string. We're going to format these strings with a block of TOML metadata terminated by a row of five plus signs (+++++). The rest of the string will contain Markdown-formatted text. Add this block of code below the definition of config_string in the main function.

    content_strings = ["""
title = "My first entry"
date = 2021-02-14T11:47:00+02:00
+++++

Hello, welcome to my **blog**
""",
"""
title = "My second entry"
date = 2021-02-15T17:47:00+02:00
+++++

This is my second post.
"""]

We'll parse these strings in our load_content_items function. Give the function a content_strings parameter and add the following code.

def load_content_items(content_strings):
    items = []
    for item in content_strings:
        frontmatter, content = re.split("^\s*\+\+\+\+\+\s*$", item, 1, re.MULTILINE)
        item = toml.loads(frontmatter)
        item['content'] = markdown.markdown(content)

        items.append(item)

    # sort in reverse chronological order
    items.sort(key=lambda x: x["date"],reverse=True)

    return items

Here we use a for loop to construct a list of items from our item strings. For each one, we split up the frontmatter and content on a regular expression that will match a line of text containing five plus signs. We pass in 1 as re.split's maxsplit parameter to ensure that we only split on the first matched line, and re.MULTILINE so that our regex will work correctly in a multiline string.

We then use toml.loads() to convert the frontmatter into a dictionary. Finally, we convert the Markdown in content into HTML and add it to the dictionary we just created. The result will be a dictionary that looks something like this:

{
    'title': 'My first entry',
    'date': datetime.datetime(2021, 2, 14, 11, 47, tzinfo=<toml.tz.TomlTz object at 0x7f4032da6eb0>),
    'content': '<p>Hello, welcome to my <strong>blog</strong>.</p>'
}

Finally, since this is a blog site, we're sorting our items dictionary in reverse chronological order. We do this by using Python's list.sort method's custom sort functionality to sort by each list entry's date value. The key parameter takes a function which it will pass each value into and use the return value to sort the list. For brevity, we've created an in-line anonymous function using a lambda expression.

Back in our main function, let's pass content_strings to the load_content_items function call.

    content = load_content_items(content_strings)

Now let's create a template string below the content strings. This is just some HTML with Jinja code in {{ }} and {% %} blocks. Add this code block beneath the definition of content_strings in the main function.

    template_string = """
<!DOCTYPE html>
<html>
    <body>
        <h1>{{ config.title }}</h1>
        {% for post in content %}
        <article>
            <h2>{{ post.title }}</h2>
            <p>Posted at {{ post.date }}</p>
            {{ post.content }}
        </article>
        {% endfor %}
    </body>
</html>
"""

Each of the values inside {{ }} blocks is something we've assembled in the preceding code: config.title from the config strings, content from the content strings, and the individual values inside the Jinja for loop from each item in the content list. Note that in Jinja, post.title is equivalent to post["title"].

To load this template, we will add the following parameter and code to the load_templates function.

def load_templates(template_string):
    return jinja2.Template(template_string)

We'll also change the load_templates function invocation in the main function.

    templates = load_templates(template_string)

Rendering the site

Now let's populate the template with our config and content data. We'll do this using the template's render() method. This method takes a list of keyword arguments which it will use to resolve the variable references template's {{ }} and {% %} blocks.

In the render_site function, add the following code:

def render_site(config, content, template):
    print(template.render(config=config, content=content))

As our render_site invocation in main already takes the correct arguments, we can run our code now. The result should look like this:

<!DOCTYPE html>
<html>
    <body>
        <h1>My blog</h1>

        <article>
            <h2>My second entry</h2>
            <p>Posted at 2021-02-15 17:47:00+02:00</p>
            <p>This is my second post.</p>
        </article>

        <article>
            <h2>My first entry</h2>
            <p>Posted at 2021-02-14 11:47:00+02:00</p>
            <p>Hello, welcome to my <strong>blog</strong></p>
        </article>

    </body>
</html>

We now have the core of our SSG. Modify the content of one of the content strings and the output will change. Add new variables to each content file's frontmatter and the template, and they will propagate through without any changes to the Python code.

Next, let's create and ingest some files.

Blog generator

First, we need to create a directory structure. In the file pane of your repl, create four directories: content, content/posts, layout and static. Your file pane should now look like this:

We will put our Markdown files in content/posts, our Jinja files in layout and unprocessed files like CSS stylesheets and images in static. We're using content/posts so we can create different content types later on, such as undated pages like "About".

Creating input files

First, we'll create our config file config.toml. In addition to the title value, we'll give it a base URL based on our repl's URL.

config.toml

title = "My blog"
baseURL = "https://YOUR-REPL-NAME-HERE.YOUR-REPLIT-USERNAME.repl.co"

Replace the all-caps text with the relevant values.

Now let's put our content strings into post files. Create two files with the following content:

content/posts/first-post.md

title = "My first entry"
date = 2021-02-14T11:47:00+02:00
+++++

Hello, welcome to my **blog**.

content/posts/second-post.md

title = "My second entry"
date = 2021-02-15T17:47:00+02:00
+++++

This is my second post.

Make as many additional posts as you want. Just remember to give each one a title, correctly formatted datestamp and some Markdown content. File names should be lowercase with no spaces, ending in the .md file extension.

In contrast to our proof of concept, this will be a multi-page website, so we're going to create three HTML files in our layout directory: index.html, post.html and macros.html.

index.html will be the template for our homepage, showing a list of blog posts in reverse chronological order.
post.html will be the template for post pages, containing their rendered Markdown content.
macros.html will not be a template, but a container file for Jinja macros. These are reusable snippets of HTML that we can use in our templates.

Create three files and populate them as follows.

layout/index.html

<!DOCTYPE html>
<html>
    {% import "macros.html" as macros %}
    {{ macros.head(config.title) }}
    <body>
        <h1>Posts</h1>
        <ul>
        {% for post in content.posts %}
            <li><a href="{{ post.url }}">{{ post.title }}</a> (posted at {{ post.date }})</li>
        {% endfor %}
        </ul>
    </body>
</html>

layout/post.html

<!DOCTYPE html>
<html>
    {% import "macros.html" as macros %}
    {{ macros.head(this.title) }}
    <body>
        <h1>{{ this.title }}</h1>
        <p>Posted at {{ this.date }}</p>
        {{ this.content }}
        <p><a href="{{ config.baseURL }}">Return to the homepage &#10558;</a></p>
    </body>
</html>

(⤾ is the HTML entity for "⤾".)

layout/macros.html

{% macro head(page_title) -%}
<head>
    <title>{{ page_title }}</title>
    <link rel="stylesheet" href="/css/style.css">
</head>
{% endmacro -%}

The only macro we've defined is head, which will generate an HTML <head> tag containing an appropriate title for the page as well as a link to our website's stylesheet. Let's create that now.

In the static directory, create a subdirectory called css. Then create a file called style.css in this subdirectory and add the following code.

static/css/style.css

h1 {
    font-family: sans-serif;
    margin-top: 2em;
}

body {
    font-family: serif;
    margin: 0 auto;
    max-width: 40em;
    line-height: 1.2em;
}

These are a couple of small style adjustments to improve readability and differentiate our site from an unstyled page. Feel free to add your own touches.

Ingesting input files

Now that we've created our input files, let's write some code in main.py to read them and create our website. To do this, we'll be iterating our proof-of-concept code.

First, at the top of the file, let's import some new modules for dealing with reading and writing files and directories. Add the second line below the first in main.py.

import jinja2, markdown, toml, re
import os, glob, pathlib, shutil, distutils.dir_util

Then delete the config_string, content_strings and template_string definitions from the main function.

Ingesting site configuration

First, let's ingest the configuration file. Change the load_config function as follows.

def load_config(config_filename):
    with open(config_filename, 'r') as config_file:
        return toml.loads(config_file.read())

Now change this line in the main function:

    config = load_config(config_string)

To this:

    config = load_config("config.toml")

Ingesting posts

Next, we will ingest the content/posts directory. Change the content of the load_content_items function as follows.

def load_content_items(content_directory):
    items = []
    for fn in glob.glob(f"{content_directory}/*.md"):
        with open(fn, 'r') as file:
            frontmatter, content = re.split("^\+\+\+\+\+$", file.read(), 1, re.MULTILINE)
        item = toml.loads(frontmatter)
        item['content'] = markdown.markdown(content)

        items.append(item)

    # sort in reverse chronological order
    items.sort(key=lambda x: x["date"],reverse=True)

    return items

Instead of looping through a list of strings, we're now looping through all files ending in .md in the content/posts directory using the glob method and parsing their contents.

Since we're now building a real site with multiple pages, we'll need to add a couple of additional attributes to our post dictionary. Namely, slug and url.

slug will be the name of the post's Markdown file without the .md extension.
url will be a partial URL including the post's date and slug. For the first post, it will look like this: /2021/02/14/first-post/

Let's create the slug by using os.path.basename to get our file's filename without its full path (i.e. first-post.md rather than content/posts/first-post.md). Then we'll use os.path.splitext on the result to split the filename and extension, and we'll discard the extension. Add the following line to the for loop, below where we define item['content'].

    item['slug'] = os.path.splitext(os.path.basename(file.name))[0]

We'll then use this slug along with our post's date to construct the full URL. We'll use Python's string formatting to ensure correct zero-padding of single-digit values for months and days. Add this line below the one we just added:

    item['url'] = f"/{item['date'].year}/{item['date'].month:0>2}/{item['date'].day:0>2}/{item['slug']}/"

Now we can update our function invocation in main. Change this line:

    content = load_content_items(content_strings)

To this:

    content = { "posts": load_content_items("content/posts") }

Using a dictionary instead of a plain list will allow us to add additional content types in a later section of this tutorial.

Ingesting templates

Now that we have a list of posts, let's ingest our templates so we have somewhere to put them. Jinja works quite differently from the file system and from strings, so we're going to change our load_templates function to create a Jinja Environment with a FileSystemLoader that knows to look for templates in a particular directory. Change the function code as follows.

def load_templates(template_directory):
    file_system_loader = jinja2.FileSystemLoader(template_directory)
    return jinja2.Environment(loader=file_system_loader)

Then, in the main function, change this line:

    template = load_templates(template_string)

To this:

    environment = load_templates("layout")

In the next section, we'll pass this environment to our render_site function where we'll load individual templates as we need them.

Writing output files

Now let's render the site by writing some output files. We'll be using a directory named public for this, but you don't need to create this in your file pane – we'll do so in code. Go to the render_site function and replace its code with the following (remember to change the function parameters).

def render_site(config, content, environment, output_directory):
    if os.path.exists(output_directory):
        shutil.rmtree(output_directory)
    os.mkdir(output_directory)

We do two things here: remove the output directory and all of its content if it exists, and create a fresh output directory. This will avoid errors when running our code multiple times.

Now let's write our home page by adding this code to the bottom of the function.

    # Homepage
    index_template = environment.get_template("index.html")
    with open(f"{output_directory}/index.html", 'w') as file:
        file.write(index_template.render(config=config,content=content))

Here we use our Jinja environment to load the template at layout/index.html. We then open the public/index.html file and write to it the results of rendering index_template with our config and content dictionaries passed in.

The code for writing individual post files is a bit more complex. Add the for loop below to the bottom of the function.

    # Post pages
    post_template = environment.get_template("post.html")
    for item in content["posts"]:
        path = f"{output_directory}/{item['url']}"
        pathlib.Path(path).mkdir(parents=True, exist_ok=True)
        with open(path+"index.html", 'w') as file:
            file.write(post_template.render(this=item, config=config, content=content))

First we create the directories necessary to show our post URLs. To display a URL such as 2021/02/14/first-post/, we need to create a directory named 2021 inside public, and then nested directories named 02, 14 and first-post. Inside the final directory, we create a file named index.html and write our rendered template to it.

Note the values we pass to render: variables for this post are contained in this and site-wide configuration variables are contained in config. We also pass in content to allow us to access other posts. Although we aren't using this in the post.html template right now, it's good to have the option for future template updates.

Now we need to load our static files. Add this code to the bottom of the render_site function:

    # Static files
    distutils.dir_util.copy_tree("static", "public")

All this code does is copy the file tree from our static directory into our public directory. This means that our CSS file at static/css/style.css can be accessed in our HTML templates as css/style.css. Similarly, if we create a file at static/my-picture.jpg, we can reference that in our HTML or Markdown as my-picture.jpg and it will be found and loaded.

Now we just need to update the function invocation in our main function. Change this line:

    render_site(config, content, templates)

To this:

    render_site(config, content, environment, "public")

Now run the code. You should see the public directory appear in your file pane. Look inside, and you'll see the directories and files we just created. To see your site in action, run the following commands in Replit's "Shell" tab.

cd public
python -m http.server

This should bring up the Replit web view with your home page, as below. Click on each of the links to visit the post pages.

This server will need to be restarted periodically as you work on your site.

Generic site generator

In addition to chronological blog posts, our site could do with undated pages, such as an "About" or "Contact" page. Depending on the kind of site we want to build, we may also want photo pages, or pages including podcast episodes, or any number of other things. If we give this SSG to someone else to use, they may have their own ideas as well – for example, they may want to make a site organised as a book with numbered chapters rather than as a blog. Rather than trying to anticipate everyone's needs, let's make it so we can create multiple types of content pages, and allow the user to define those types and how they should be ordered.

This is simpler than it sounds, but will require some refactoring.

Expanding the config file

First, let's add some content to our config.toml file to give this customization a definite shape. Add these lines below the definition of baseURL.

config.toml

title = "My site"
baseURL = "https://YOUR-REPL-NAME-HERE.YOUR-REPLIT-USERNAME.repl.co"

types = ["post", "page"]

post.dateInURL = true
post.sortBy = "date"
post.sortReverse = true

page.dateInURL = false
page.sortBy = "title"
page.sortReverse = false

Here we've told our site generator we want two kinds of pages – a post type, which we will use for blog posts, and a page type, which we will use for evergreen content such as contact details and general site information. Below that, we've used TOML's dictionary syntax to specify some characteristics of each type.

Posts will have a date in their URLs and will be sorted in reverse date order when listed.
Pages will not have a date in their URLs and will be sorted alphabetically by their title.

By creating these settings, we'll make it possible to sort a content type by any attribute in its frontmatter.

Ingesting user-defined content

To implement this, let's first import a new module at the top of main.py. Add the third line to your file, below the first two.

import jinja2, markdown, toml, re
import glob, pathlib, os, shutil, distutils.dir_util
import inflect

The inflect module allows us to turn singular words into plurals and vice versa. This will be useful for working with the types list from our configuration file. Change the load_config function to resemble the following.

def load_config(config_filename) 

    with open(config_filename, 'r') as config_file:
        config = toml.loads(config_file.read())

    ie = inflect.engine()
    for content_type in config["types"]:
        config[content_type]["plural"] = ie.plural(content_type)

    return config

This code will expand the dictionaries we load from our config file with a key containing the type's plural. If we were to print out our config dictionary at this point, it would look like this:

{
    "title": "My site"
    "baseURL": "https://YOUR-REPL-NAME-HERE.YOUR-REPLIT-USERNAME.repl.co"
    "types": ["post", "page"]
    "post": {
        "plural": "posts",
        "dateInURL": true,
        "sortBy": "date",
        "sortReverse": true
    },
    "page": {
        "plural": "pages",
        "dateInURL": true,
        "sortBy": "title",
        "sortReverse": false
    }
}

Now let's modify load_content_items to deal with multiple, user-defined content types. First, we need to change the function to take our config dictionary as an additional parameter. Second, we'll put all of our function's current content in an inner function named load_content_type. Your function should now look like this:

def load_content_items(config, content_directory):

    def load_content_type(content_type):
        items = []
        for fn in glob.glob(f"{content_directory}/*.md"):
            with open(fn, 'r') as file:
                frontmatter, content = re.split("^\+\+\+\+\+$", file.read(), 1, re.MULTILINE)

            item = toml.loads(frontmatter)
            item['content'] = markdown.markdown(content)
            item['slug'] = os.path.splitext(os.path.basename(file.name))[0]
            item['url'] = f"/{item['date'].year}/{item['date'].month:0>2}/{item['date'].day:0>2}/{item['slug']}/"

            items.append(item)

        # sort in reverse chronological order
        items.sort(key=lambda x: x["date"],reverse=True)

        return items

To load from the correct directory, we will need to change this line:

        for fn in glob.glob(f"{content_directory}/*.md"):

To this:

        for fn in glob.glob(f"{content_directory}/{config[content_type]['plural']}/*.md"):

Here we're using the plural of the content type we defined earlier. This will ensure that items of type "post" can be found in "content/posts" and items of type "page" can be found in "content/pages".

We now need to add code to respect our configuration settings. We'll do this by changing this line:

            item['url'] = f"/{item['date'].year}/{item['date'].month:0>2}/{item['date'].day:0>2}/{item['slug']}/"

To this:

            if config[content_type]["dateInURL"]:
                item['url'] = f"/{item['date'].year}/{item['date'].month:0>2}/{item['date'].day:0>2}/{item['slug']}/"
            else:
                item['url'] = f"/{item['slug']}/"

Now we'll sort according to the configuration file by changing this line:

    # sort in reverse chronological order
    items.sort(key=lambda x: x["date"],reverse=True)

To this:

    # sort according to config
    items.sort(key=lambda x: x[config[content_type]["sortBy"]],
               reverse=config[content_type]["sortReverse"])

We can complete this load_content_items function by writing some code to iterate through our site's configured content types, calling load_content_type for each one. Add the following code below the definition of load_content_type (ensure that it's de-indented so as to be part of load_content_items).

    content_types = {}
    for content_type in config["types"]:
        content_types[config[content_type]['plural']] = load_content_type(content_type)

    return content_types

Then in the main function, change this line:

    content = { "posts": load_content_items("content/posts") }

To this:

    content = load_content_items(config, "content")

Rendering user-defined content

Now we need to change our output code in render_site to render each content type with its own template. As we did with load_content_items, we'll start by moving the post-creating for loop into an inner function, this time named render_type. Alter your render_site function so that it resembles the following.

def render_site(config, content, environment, output_directory):

    def render_type(content_type): # <-- new inner function
        # Post pages
        post_template = environment.get_template("post.html")
        for item in content["posts"]:
            path = f"public/{item['url']}"
            pathlib.Path(path).mkdir(parents=True, exist_ok=True)
            with open(path+"index.html", 'w') as file:
                file.write(post_template.render(this=item, config=config))

    if os.path.exists(output_directory):
        shutil.rmtree(output_directory)
    os.mkdir(output_directory)

    for content_type in config["types"]: # <-- new for loop
        render_type(content_type)

    # !!! post for loop moved to inner function above

    # Homepage
    index_template = environment.get_template("index.html")
    with open("public/index.html", 'w') as file:
        file.write(index_template.render(site=site))


    # Static files
    distutils.dir_util.copy_tree("static", "public")

Then change this line in the render_type inner function that loads the post template:

        post_template = environment.get_template("post.html")

Into this line that loads a template for the provided content type:

        template = environment.get_template(f"{content_type}.html")

Alter the for loop below that line to use the content type's plural.

        for item in content[config[content_type]["plural"]]:

Finally, change post_template in the loop's final line to template.

                file.write(template.render(this=item, config=config, content=content))

Adding a new content type

Now that we've done all that work to generify our code, all that's left is to create our pages. First, let's create a page template at layout/page.html. Use the following code.

<!DOCTYPE html>
<html>
    {% import "macros.html" as macros %}
    {{ macros.head(this.title) }}
    <body>
        <h1>{{ this.title }}</h1>
        {{ this.content }}
        <p><a href="{{ config.baseURL }}">Return to the homepage &#10558;</a></p>
    </body>
</html>

This is just our post.html template without the date.

Now create a new subdirectory in content called pages. Inside that subdirectory, create a file named about.md and put the following content in it.

title = "About"
+++++

This website is built with Python, Jinja, TOML and Markdown.

This is sufficient to create a new page at /about/, but it won't be linked anywhere. For that, we'll need to create a global navigation bar for our site. Create the following additional macro in layout/macros.html.

{% macro navigation(pages) -%}
<nav><ul>
    {% for page in pages %}
        <li><a href="{{ page.url }}">{{ page.title }}</a></li>
    {% endfor %}
</ul></nav>
{% endmacro -%}

Then include the macro in index.html, page.html and post.html by inserting the following code just underneath {{ macros.head(this.title) }}.

    {{ macros.navigation(content.pages) }}

Finally, add the CSS below to static/css/style.css to apply light styling to the navigation bar.

nav ul
{
    list-style-type: none;
    text-align: right;
}

Run your code and preview your site with cd public && python -m http.server in the repl shell, and you should see something like this:

Where to next?

We've created a flexible SSG capable of generating many different types of HTML pages, which can be served from any web server. Apart from fleshing out the templates and adding new content types, you might want to expand the generator's functionality to allow things like:

Categories or tags for content items.
Ability to generate an RSS or Atom feed for people to subscribe to.
A way to mark items as drafts, so they won't be included when the site is compiled.
Navigation features like next and previous item links.
Useful error messages for malformed directory structures and configuration files.

You can find our SSG example repl here

Top comments (1)

Haruan Justino • Mar 26 '21

This is a great post!

DEV Community

Create a static site generator with Python and Replit

Building a proof of concept

Setting up and defining the flow

Parsing content and templates

Rendering the site

Blog generator

Creating input files

Ingesting input files

Ingesting site configuration

Ingesting posts

Ingesting templates

Writing output files

Generic site generator

Expanding the config file

Ingesting user-defined content

Rendering user-defined content

Adding a new content type

Where to next?

Top comments (1)

Read next

Fixing Z-Axis Character Jitter: A Practical Guide

Fixing '@layer utilities...' Tailwind Error: A Quick Guide

Host a static website on AWS: A detailed step-by-step guide

🚀 React Patterns: Essential Tips and Tricks for Developers