Contentful Blog

Posted on Jul 17, 2018 • Updated on Nov 14, 2018 • Originally published at contentful.com

The understated innovation of static site generators

#staticsite #stack #webdev #serverless

This article is a reflection on the bigger picture around the new generation of static sites generators that are currently gaining traction and adoption.

Static site generators have been around for a while and an early player in the field was Jekyll, which gained a lot of traction at launch, triggered the birth of GitHub Pages and hence created a developer-friendly solution to content. We’re now seeing a new wave of static site generators like GatsbyJS, that no longer are limited to publishing needs of the developer audience (as in, manage your content completely on GitHub), and instead have the ambition of becoming the new tool for content publishing.

The current wave of static site generators comes with an entire ecosystem that makes the movement wider in scope than site-building tools themselves or even just content. While there’s a lot of attention surrounding what this means for applications that directly touch the browser, there’s much less thought put into behind-the-scenes reasons on why these new stacks like Contentful + GatsbyJS + Netlify are getting attention and being chosen by development teams.

Runtime that reaches out for data and some adventures in game servers

Years ago, I worked at Wooga, building game servers for social games—one problematic performance aspect of games is that the state of the game has to be updated very frequently. This means game servers are write-heavy beasts that hammer your databases in a way where caching, the technique most often used in lifting database load to cope with high traffic, would not be an option.

It was back then when I started reflecting on the implication of choices regarding the position of the runtime. By runtime, I mean the process where the code is running—which is with respect to all databases and services that need to be reached in order for actions, that allow you to fulfill the current request, to be performed.

The solution to the pressure that our applications put on databases was a complete turn-around of all strategies surrounding interaction with the data layer; which meant transitioning from a model, where each and every request had a direct database interaction, to one where a user’s entire game-state would be loaded in the runtime (we chose Erlang) at the beginning of a session. You can still find the slides—from a scary number of years ago—from the beginning of the project and the considerations after running it in production.

In a nutshell, that project led to the decrease number of interactions with databases by two or three orders of magnitude and achieve sub-millisecond response time improvement by two orders of magnitude compared to previous approach. The idea must have also germinated in other brains within the gaming industry, as Microsoft launched Orleans a year later.

What we learned from the push for having data in the runtime

So, how is our work at Wooga on strategies around working with game-state now relevant in the scope of static site generators?

Working on that topic brought me to more general reflections on how stacks are built (data at last)—the runtime is the part that coordinates all actions, the request enters the runtime and from the runtime, you ensure that a number of other systems are touched or triggered.

After a request enters your runtime, you write your code which is a mix of two things:

Interacting with elements external to the process

a. Reading data
b. Writing data
c. Checking authorization
d. Triggering tracking
The actual business logic that must be performed in order to satisfy the request

Now if you think about the two places most of the pain (bugs, errors, outage) comes from; writing and operating application, I’m sure that the former is thought of as the more evil place to be.

I remember clearly what it meant to start programming in a paradigm where writing was mostly out of the picture, which enabled me to realize the idea of ninja state or "flow" that you can find in books like “Getting things done” or “Flow”.

It’s a superior state of mind, where you are able to just think and focus on what you want to achieve and are not stuck with internal dialogs like:

"Where’s the data I need?" - all data is just there; just access and manipulate it as needed
"In what order would these things happen?" - in the exact order you’re writing them, which should be arranged flow naturally according to how one would think
"Is there anything that could fail in this?" - the only problem you might have is with the data at hand; but there’s no other interference or clutter that could influence what you’re writing (no network errors, collisions, race condition).

These considerations have led me to think about the "upside down stack", one where the database or other services come first—not in between or just in time as the logic progresses—and where you enter the runtime with all data and authorization necessary in order to perform the request.

How are static site generators a step towards the "upside-down stack"?

Static site generators are essentially build processes, processes that are triggered because of change in either the code or data underlying the static artifact (this is a novelty of SSGs building both data & code versus building just code). Because these processes are triggered by changes in the data or the code, those changes are what enter computation rather than something that is fetched a runtime.

This is how static site generators are akin to the "upside-down stack"; data comes first and is what you enter the process with, versus being something that is sought after only during an ongoing process.

This vision is already present in a modern build system like Concourse CI with its concept of Pipelines being composed of Job and Resources. This goes in the direction of thinking "Okay, this job is triggered by a change on some resources and, in order to perform the job, other resources will also be needed." The dependency tree of the resources required in order to perform an action is specified beforehand and you can ensure that all resources are in place prior to performing a job.

This is where I see static site generators as early adopters of certain patterns that will gain more generalized adoption. If you have a domain where the data is read more than it is written, a build-driven approach has a number of advantages (among them excellent performance, low cost and a better security model).

The intersection with serverless

As mentioned in my previous post, where I predicted some bigger trends behind the surface of Serverless & GraphQL, serverless has deeper implications than what is usually known for. For example, it can be seen as Serverless is a very compelling technology to deal with workloads that are build driven like static site generators, you need to handle workloads that are event driven (with events generated by changes to the application, the data or the domain model) and then different in scale and distribution over time than workloads to serving constant traffic.

To support build cycles and pipelines, Serverless becomes relevant again as infrastructure used to host these new kinds of architectures. Step functions are something to look at in order to understand how this future might look like.

Conclusion

Before concluding this post, it’s important to look at what static site generators represent and which factors they rely on to achieve the popularity they currently enjoy. The first important realization is that static site generators can be more dynamic than the name suggests and that the concepts they apply might see much wider adoption if accompanied with a significant innovation of the toolchains currently in use. As an example static site generators might be a pointer to the fact that build processes are about to be re-evaluated as a delivery mechanism for more than application logic.

In short, here are trends to watch for, as they might serve as seeds for future looking-thoughts:

Data enters the runtime at the very beginning of the computation, instead of being grabbed during computation
The event that triggers the computation is a change in application logic, in the data of the domain that the application deals with, or changes of the data domain itself
A user request can be satisfied by serving a pre-compiled set of assets (that are built only when the underlying data is updated, and not at request time)
The data is "built", instead of being fetched and integrated at every request; requiring more modern build systems and innovations of the concept of build systems

Next step

It’s important to keep watching how the phenomenon of static sites continues to evolve and seeing the prospects of the new generation of web architectures that are build-based rather than the current request/response-focused patterns.

In order for that trend to grow, other trends need to fall into place; build systems and databases (or systems that front the database) are some areas we need to look at for innovations. This will be something I will elaborate in follow up posts of this series.