Matt Layman

Posted on Nov 16, 2020 • Originally published at mattlayman.com

Make A Hugo Static Blog Inside A Django App

#python #django #hugo #whitenoise

I have a side project and I'd like to do some content marketing to potential customers to show how my product is useful. To do this, I need a blog for my project.

Maybe you need a blog for your project too. Have you thought about where your blog will exist on the internet? For me, I considered two choices:

Use a subdomain like blog.mysite.com.
Use a route style like mysite.com/blog/. (I learned in my research that SEO experts call this a "subfolder" style.)

I chose the latter approach.

I'm also a big fan of statically generated websites. I like to keep my articles in version control and write my content in Markdown. This approach works well for my writing flow. In fact, if you're reading this on my website, mattlayman.com, you're reading content that was generated by Hugo, my static site generator of choice.

Since my Django application runs on my main domain, how could I include a route-based blog onto the domain without tripping over the app?

I could see two strategies to making a route-based blog work.

Put some software between browsers and my application server that can intercept and route blog traffic to the static files generated by Hugo.
Make the application server serve the blog.

Putting Software In Front Of The App Server

This is a well-trodden path. This is also the path that I didn't pick (more on that later).

In many (most?) Django deployments, the Django application runs with a Python application server like Gunicorn or uWSGI. These application servers have the job of delegating the dynamic requests to Django views.

What do we do with requests that aren't dynamic like JavaScript or image files? Django has a process to manage these static files that will collect all the files into a single directory that can be served by other software.

What other software? Typically, that other software is a general purpose web server like Nginx or the Apache HTTP Server. When one of these web servers is in between and delegating to the Python application server, we call the web server a reverse proxy. Check out Cloudflare's reverse proxy article for a good explanation of why this kind of server is considered a reverse proxy.

A web server like Nginx can be configured to serve static files at a particular route. For Django's typical static file handling, you would configure Nginx to route the static files directory that Django produces to something like /static/. For any request for a static file that comes to your site, Nginx would detect that the path starts with /static/ and would send back the file directly rather than requesting it from the application server.

Knowing that, it's not much of a conceptual leap to see how to serve a static blog. In this scenario, you would generate the blog with your tool of choice and configure Nginx to route anything coming to /blog/ to the output of your static site generator.

To be frank, if your infrastructure can handle this style well, this will probably be an easier approach. If you're feeling a bit more adventurous or like walking a different path, read on.

The Road Less Traveled

Another way of serving static files for your Django app is to let the application server do it. This approach is not as performant as the reverse proxy approach, but it has the advantage of being a simpler setup because you only have one kind of server running, not two.

When you want to use this style, you'd reach for WhiteNoise.

For Django projects, WhiteNoise is designed to work with Django's static files scheme. That means that the library will have no trouble serving your CSS, JS, images, or whatever else. This also means that you can expect all of these files to be served out of /static/.

If you're ok with serving your blog from /static/blog/, then your life would be pretty simple. When you deploy, you'd generate your blog content from your static site generator, include the output directory as a directory in the STATICFILES_DIRS Django setting, and you're done.

That kind of URL path sounds gross to me. What casual non-tech reader would expect to read a blog post at /static/blog/? Yuck. That kind of reader is unlikely to know what "static" would mean.

My goal was to get WhiteNoise to serve my blog at /blog/. That's the setup. Let's see how it worked out.

The Details

Before seeing all the details, let me make sure I address why I did this.

My application is running on Heroku. Heroku makes deployment so simple for basic apps. With a small file called a Procfile that looks like

release: python manage.py migrate
web: gunicorn project.wsgi --log-file -

I can get an entirely operational application on their platform. The downside of using a Platform as a Service (PaaS) like Heroku is that I have less control of the environment.

In this circumstance, I don't have the ability to introduce a reverse proxy like Nginx. I could cobble together some scheme with a shell script that would let me run both Nginx and Gunicorn, but I'd have difficulty guaranteeing that both processes would stay running.

Unless I wanted the blog to run on a separate subdomain, which I already mentioned that I don't, I need to make the Gunicorn application server serve the blog.

First, let's get the blog itself going. I'm going to skip most of the details of working with Hugo, but I want you to have some names that you can consider as we work through this problem. From the root of my repository, I ran:

$ hugo new site blog

This created a new Hugo site with some empty directories. Because I wanted to keep all of the directories, I added some hidden files so that Git would track them. For example,

$ touch blog/data/.gitkeep

Later in my experimentation, I found that the config.toml needed to be in the root of the repository. Since I didn't want to fill the repository root with other Hugo directories, I had to adjust some variables in the config file to look in the blog directory.

# config.toml

archetypeDir = "blog/archetypes"
contentDir = "blog/content"
data = "blog/data"
layoutDir = "blog/layouts"
staticDir = "blog/static"
themesDir = "blog/themes"

I also set the directory where I wanted the blog output. This name is important later.

# config.toml

publishDir = "blog_out"

To finish off the Hugo setup (aside from the actual blog content generation which I'm not going to describe), I added Hugo generated directories to my .gitignore.

# Blog
blog_out/
resources/

With this much configuration, I can generate my product's blog with a single command.

$ hugo

                   | EN
+------------------+----+
  Pages            | 13
  Paginator pages  |  0
  Non-page files   |  2
  Static files     | 10
  Processed images |  7
  Aliases          |  0
  Sitemaps         |  1
  Cleaned          |  0

Total in 14 ms

14ms! It's so fast!

My next job was to teach Heroku how to generate the blog with each deployment. Heroku tries to figure out how to build your application by checking for certain files. Because of the manage.py file, Heroku detected that I have a Python project.

When there are multiple types of things to build, you have to be a bit more explicit. Thus, I needed to add a buildpack. Buildpacks are responsible for assembling a project's code into a format that the Heroku platform can run. To make Hugo go, I added the roperzh/hugo community contributed buildpack.

$ heroku buildpacks:add --index 1 roperzh/hugo

I set my Hugo version in Heroku to the same version that I use locally on my Mac to ensure consistency.

$ heroku config:set HUGO_VERSION=0.46

From this configuration, my Heroku deployments now build my Hugo blog and store the output content in the blog_out directory of the built application artifact (which Heroku calls a "slug").

We can take another step to optimize the blog. WhiteNoise will serve a compressed version (either gzip or brotli) if there is a file on disk with the same name and ending with a .gz or .br extension (e.g., index.html and index.html.gz).

Once Hugo is done generating the blog output, we can instruct WhiteNoise to compress the files. To do this, I used a bin/post_compile script that runs as part of the Python buildpack. My script looks like:

#!/bin/bash

set -e

python -m whitenoise.compress blog_out

This generates the compressed versions so that my application server can serve fewer bytes when sending static blog files to a browser.

Now that the static content side is ready to go, let's teach Django how to serve the blog.

In Django, WhiteNoise works by running a Django middleware. This middleware is designed to run very early in the stack of middleware to intercept requests to static files and return them before the application server wastes too much time processing the request.

The problem with the middlware (if I can even call it a problem) is that it is designed to work exclusively with the static files mechanism that Django exposes as the static files interface. This means that it serves content for /static/ URLs from directories that are either in static directories inside of each Django app or static files included in STATICFILES_DIRS.

Unfortunately, there's no Django setting that will permit a developer to say "Hey, WhiteNoise, please serve these files at this other path too!"

My solution to this dilemma was to "use the source, Luke (uh, Matt)!" You can see in Using WhiteNoise with any WSGI application that the WhiteNoise class has a method named add_files. This method takes a directory and serves those files at some developer-defined prefix. While inspecting the source, I found that the WhiteNoiseMiddleware is a subclass of WhiteNoise.

Enter MoreWhiteNoiseMiddleware.

I decided to make a subclass of the WhiteNoiseMiddleware to take advantage of the add_files method for my Django project. (My side project is called "homeschool" so you'll see that in the code snippets below.)

# homeschool/middlware.py

from django.conf import settings
from whitenoise.middleware import WhiteNoiseMiddleware

class MoreWhiteNoiseMiddleware(WhiteNoiseMiddleware):
    def __init__(self, get_response=None, settings=settings):
        super().__init__(get_response, settings=settings)
        for more_noise in settings.MORE_WHITENOISE:
            self.add_files(
                more_noise["directory"], prefix=more_noise["prefix"])

Then I made the following changes to my settings file.

# project/settings.py

MIDDLEWARE = [
    "django.middleware.security.SecurityMiddleware",
    "homeschool.middleware.MoreWhiteNoiseMiddleware",
    ...
]

...

MORE_WHITENOISE = [
    {"directory": os.path.join(BASE_DIR, "blog_out"), "prefix": "blog/"}
]

WHITENOISE_INDEX_FILE = True

That last setting will make sure that WhiteNoise serves any directory with an index.html as the file. That means that my blog post content that Hugo puts into /blog/some-post/index.html will be accessible at mysite.com/blog/some-post/. Other rules about this setting are in the documentation.

With this tiny middleware subclass, I can serve up more static directories! My product blog is served up at the pretty /blog/ URLs that I wanted. Victory!

I'm only doing this for my product's blog now, but I will probably do something similar with the product documentation in the future.

Tradeoffs

What's the catch? It seems like there's always a catch.

The biggest catch is caching, or, the lack thereof. With the scheme I've described, Django isn't able to generate filenames for the blog files that include the hash of the file content.

In a normal static files setup, you can use ManifestStaticFilesStorage to generate those file names. With that storage engine, Django will generate a manifest file that stores a dictionary of original filenames to the versioned filename that includes the hash. Django uses the manifest during template rendering to serve up HTML content that includes the hashes (e.g., a file named base.css would be sent to the user as base.1234abcd.css).

Because Django sends out the versioned filenames for browsers, when the browser comes back to the server to request a CSS file like base.1234abcd.css, WhiteNoise can detect that the file is "versioned." With a versioned file, WhiteNoise will set the Cache-Control HTTP cache header to tell the browser that the file can be safely cached for a very long time.

The content generated by Hugo doesn't go through the Django template engine and won't include those version hashes. Thus, WhiteNoise can't detect the files won't change because of the absense of hashes. Since the code doesn't know if the file will change, it can't set Cache-Control far into the future. Instead, it will set the header to one minute which is configurable via the WHITENOISE_MAX_AGE setting.

For a small product like mine, this tradeoff is totally reasonable. If I had a product blog with massive amounts of traffic, I'd probably have a more complex infrastructure anyway and be in a position to use a reverse proxy instead.

The other minor tradeoff is that WHITENOISE_INDEX_FILE setting. By enabling that, I open up my Django server to serving directories that include index.html. This is good and desirable for the blog, but the side effect is that any other directory in my static files that happens to have an index.html file in it is also now available. That may not affect your app if you try this approach, but it's something to be cognizant of.

Summary

I started this adventure by looking for an alternative way to serve a blog for my Heroku project.

In the process, we learned about:

Heroku buildpacks and how to use multiple buildpacks to generate an app that requires multiple tools
WhiteNoise and how to customize the middleware to give it more files to serve
Caching and the tradeoffs associated with my approach

I hope you found this little adventure interesting. Next time you need some static content for your Django project outside of /static/, now you know of an option that doesn't include subdomains or a reverse proxy!

If you have questions or enjoyed this article, please feel free to message me on Twitter at @mblayman or share if you think others might be interested too.

DEV Community