Matheus Cunha

Posted on Aug 7, 2021 • Edited on Aug 17, 2021 • Originally published at macunha.me

Terraform Design Best Practices

#terraform #infrastructureascode #cloud #devops

Intro
1. “Bad Idea” capitalized!
Design

Intro

As someone who believes in empowering people and distributing power in order to achieve higher outcomes I always felt that the best existing best-practices proposals don’t touch some key aspects (IMHO) on code evolution and business structures.

Therefore, this design document shall compose on the previous including some self-service Ops and micro-services spice to the mix.

On Terraform best practices great insights on how to write code inside a module is provided, e.g. naming conventions, Terraform file naming.

We can’t leave Terragrunt epic blog post unmentioned:

5 Lessons Learned From Writing Over 300,000 Lines of Infrastructure Code;

As well as the Terragrunt documentation pointing “one of the most important lessons” is that:

large modules should be considered harmful. That is, it is a Bad Idea to define all of your environments (dev, stage, prod, etc), or even a large amount of infrastructure (servers, databases, load balancers, DNS, etc), in a single Terraform module. Large modules are slow, insecure, hard to update, hard to code review, hard to test, and brittle (i.e., you have all your eggs in one basket).

“Bad Idea” capitalized!

Which is totally true, as this “Bad Idea” usually coming from a lack of care towards Terraform code design tend to be harmful in the long run, with a tendency towards making the implementation a big ball of mud.

A Big Ball of Mud is a haphazardly structured, sprawling, sloppy, duct-tape-and-baling-wire, spaghetti-code jungle. These systems show unmistakable signs of unregulated growth, and repeated, expedient repair. Information is shared promiscuously among distant elements of the system, often to the point where nearly all the important information becomes global or duplicated.

The overall structure of the system may never have been well defined.

Oftentimes, Terraform code implementation fluctuate towards mono-repositories (a.k.a. monorepos) containing all the specification in a single place. In order to tame the chaos, the Terraform state needs to be at least sub-divided into logical sections.

Design

Shallow “tree” of shared resources

Following the recommendations for structuring code one of the proposals is to keep a shallow “tree” of resources and modules. This tree produces a small and clear distribution of Terraform code.

Why a shallow “tree” of resources? It helps achieving a short amount of resources and modules that result in a small remote state file. With a small remote state we speed-up the development process and reduce waste (Muda in the Toyota 3M model), as the shallow tree enables faster executions of Terraform (less data to sync and compare).

The granularity level will be defined for each specific case (no silver bullet) balancing the smallest and most feasible composition possible.

Product areas (a.k.a. Business capabilities) structure and ownership

Ideally, the composition level would be organized around Product Areas (either squads/crews or guilds) with a fallback to shared technologies (e.g. vpc, databases). Therefore, Terraform compositions are designed around what Martin Fowler calls “Business capabilities” in micro-services terminology, ideally the Terraform composition will follow the organizational structure so that each team “owns” (in both senses: ownership and freedom) its own state.

The main goal here is to structure the Terraform code as a reflection of the organization so that is fosters self-service Ops. If the Infrastructure as Code is mature enough to the point of having well-described Terraform modules, everyone should be empowered to define these modules by setting the parameters according to their needs, without centralizing power on a Operations team.

The resource composition must gravitate towards the following (ordered by priority from higher to lower):

Product Areas (ownership) directory structure:
1. squad/crew OR guild;
2. product.
Shared resources, around technologies.

Looking on the structure from bottom-up it starts from the product and then attributes the product to a crew through the directory tree.

e.g.:

# Squad or Crew
red-team
└── payment     # Product (i.e. micro-service) name
    └── main.tf # Any resource used by the payment product

# Guild (organized around technology)
back-end
└─ monolith    # Shared application in terms of ownership
   └── main.tf # Cloud resources used by the monolith

On the example above, we can’t ignore that monolith is a product with shared ownership among back-end developers and therefore it is organized to follow the business structure.

The structure is inspired on Terragrunt’s best-practices to some extend. However, it distinct from Terragrunt proposal in the way resources are divided, rather than organizing resources exclusively around technologies.

Shared resources, organized around technologies

Oftentimes in organizations we will face shared resources among products, there is no way around reality. e.g. a shared VPC or SQL database.

However, these situations should be the exception and not the norm. Dealt similar to the organization of Terraform compositions around guilds/technologies.

platform # as in Platform Engineering
└── vpc
    └── main.tf

back-end
└── database
    └── main.tf

Files inside the composition?

Ideally the files in the sub-directory (which specify the composition) are going to partially follow this spec and include data.tf, terraform.tf and providers.tf on top of that.

main.tf: contains locals, module and resource definitions;
variables.tf: contains declarations of variables (i.e. inputs/parameters) used in main.tf;
data.tf: contains data-resources for input data used in main.tf;
outputs.tf: contains outputs from the resources created in main.tf;
providers.tf: contains provider and provider’s versions definitions;
terraform.tf: contains the terraform back-end (e.g. remote state) definition;

What about Terraform modules?

Terraform modules are containers for multiple resources that are used together to achieve a shared goal. Modules can be used to create lightweight abstractions, facilitating reusability and distribution of Terraform code.

Therefore, we assume that the following are anti-patterns that make Terraform modules’ reusability difficult:

Configuration of Terraform Providers inside a module;
Implementation of Business logic and/or hard-coded parameters in a module;
Default values are specified in optional variables instead of hard-coding;
Modules should be self-contained and provide a clear contract. Dependencies (pre-existing resources) must be specified through required variables.
Modules must serve to a singular purpose. Multiple purpose must be achieved through composability of modules and not by “monolithic” modules.

Modules are abstractions that should be used to reduce the amount of code duplication, implementing the DRY (don’t repeat yourself) principle.

On top of that, modules are an important factor to reduce the parity among environments, which helps to better address the Twelve-Factor App model in regards to Factor X (ten).

Top comments (3)

Andrei Dascalu • Aug 7 '21

It would be really interesting to see an article about terraform refactoring. I was never able to avoid a ball of mud.
Between version 0.9 and the current 0.14 there have been lots of changes to allow better organisation and state management.
But when you're starting out with 2 VMS and a DB and grow into a multizone system with lots of moving parts it seems important to be able to refactor along the way without messing up your state.