DEV Community

Matheus Cunha
Matheus Cunha

Posted on • Originally published at

Real-life Terraform Refactoring Guide

Table of Contents

  1. Intro
  2. How to break a big ball of mud? STRANGLE IT
  3. The mono-repository (monorepo) approach to Legacy
    1. Splitting the modules sub-path to its own repository
    2. Let’s start strangling the repository
      1. Import state? Remove state and code from what? Where?


As reality hits, the unavoidable fact of dealing with a hard-to-manage Terraform Big ball of mud code base comes in. There is no way around natural growth and evolution of code bases and the design flaws that come with it. Our Agile mindset is to “move fast and break things”, implement something as simple as possible and let the design decisions for the next iterations (if any).

Refactoring Terraform code is actually as natural as developing it, time and time again you will be face situation where a better structure or organization can be achieved, maybe you want to upgrade from a home-made module to an open-source/community alternative, maybe you just want to segregate your resources into different states to speed-up development. Regardless of the goal, once you get into it, you will realize that Terraform code refactoring is actually a basic missing step on the development process that no one told you before.

As the Suffering-Oriented Programming mantra dictates:

“First make it possible. Then make it beautiful. Then make it fast.”

So, time to make the Terraform code beautiful!

How to break a big ball of mud? STRANGLE IT

<joke>Martin Fowler has already written everything there is to write about (early 2000s) DevOps, Agile, and Software Development. Therefore, we could reference Martin Fowler for virtually anything Software related</joke>, but really, the Refactoring book is THE reference on this subject.

Martin Fowler shared the Stangler (Fig) Pattern, which describes a strategy to refactor a legacy code base by re-implementing the same features (sometimes even the bugs) on another application.

[…] the huge strangler figs. They seed in the upper branches of a tree and gradually work their way down the tree until they root in the soil. Over many years they grow into fantastic and beautiful shapes, meanwhile strangling and killing the tree that was their host.

This metaphor struck me as a way of describing a way of doing a rewrite of an important system.

In this document we are going to follow the same idea:

  1. implement the same feature on a different Terraform composition;
  2. migrate the Terraform state;
  3. delete (kill) the previous implementation.

The mono-repository (monorepo) approach to Legacy

Let’s suppose that your Terraform code base is versioned in a single repository (a.k.a. monorepo), following the random structure displayed below (just to help illustrate)

├── modules/    # Definition of TF modules used by underlying compositions
├── global/     # Resources that aren't restricted to one environment
│   ├── aws/
├── production/ # Production environment resources
│   └── aws/
└── staging/    # Staging environment resources
    └── aws/
Enter fullscreen mode Exit fullscreen mode

In this example, each directory corresponds to a Terraform state. In order to apply changes, you have to walk to a path and execute terraform.

The structure on this example repository was created a few hypothetical years ago when the number of existing microservices and resources (DB, message queues, etc) was significantly smaller. At the time, it was feasible to keep Terraform definitions together because it was easier to maintain, Cloud resources were managed with one-shot!

As time went by, the number of Products and the team grew, and engineers started facing concurrency issues: Terraform lock executions on shared storage when someone else is running terraform apply as well as a general slowness on every execution since the number of data sources to sync is frightening.

A mono-repository approach is not necessarily bad, versioning is actually simpler when performed in one single repository. Ideally, there won’t be many changes on the scale of GiB meaning that it is safe to proceed on this one as long as the Terraform remote states are divided.

Splitting the modules sub-path to its own repository

One thing to mention though is the modules sub-path, this one could be stored in a different git repository to leverage its own versioning. Since Terraform modules and their implementations don’t always evolve at the same pace, keeping two distinct version trees is beneficial. Additionally, a separated repository for Terraform modules allows the specification of “pinned versions”, e.g.:

module "aws_main_vpc" {
    source = "git::"
    # Note the ref=${GIT_REVISION_DIGEST}
Enter fullscreen mode Exit fullscreen mode

That reference for a module’s version should always be specified, regardless if it comes from an internal/private repository or public. When you specify the version, you are ensuring reproducibility.

Therefore, let’s move the modules sub-path to another git repository, following instructions from this StackOverflow answer so that the git commit history is preserved:


Walk to the monorepo path and create a branch from the commits at monorepo/modules path

git subtree split -P modules -b refact-modules
Enter fullscreen mode Exit fullscreen mode


Create the new repository

mkdir /path/to/the/terraform-modules && cd $_
git init
git pull "${MAIN_BIGGER_REPO}" refact-modules
Enter fullscreen mode Exit fullscreen mode


Link the new repository to your remote Git (server) and push the commits

git remote add origin <>
git push -u origin master
Enter fullscreen mode Exit fullscreen mode


Cleanup the history related to modules from $MAIN_BIGGER_REPO [OPTIONAL]

git rm -rf modules
git filter-branch --prune-empty \
    --tree-filter "rm -rf modules" -f HEAD
Enter fullscreen mode Exit fullscreen mode

Let’s start strangling the repository

Now that a substantial piece of code was moved somewhere else, it is time to put the Stangler (Fig) Pattern in practice.

Move all the existing content as-is to the legacy sub-path, keeping the same repository and change history (commits). It also allows applying the legacy code as it used to be from one of those paths.

└── legacy
    ├── global
    │   └── aws
    ├── production
    │   └── aws
    └── staging
        └── aws
Enter fullscreen mode Exit fullscreen mode

Once the content is moved to legacy, the idea is to follow the Boy Scout rule in order to strangle the legacy content little by little (unless you are
really committed to migrating it all at once, which is going to be exhaustive).

The Boy Scout rule goes like this:

  1. every time a task that involves deprecated code appears, we implement it on the new structure;
  2. import the Terraform state to keep the Cloud resources that a given code represents/describes;
  3. remove the state and the code from legacy.

Until there is nothing left inside legacy (or there are only unused resources/left-behinds that could be destroyed/garbage collected either way).

Import state? Remove state and code from what? Where?

That will depend on the kind of resource we are migrating from the remote state, on the bottom of each resource on Terraform’s provider documentation you can find a reference command to import existing resources into your Terraform code specification. e.g.: AWS RDS DB instance.

Suppose we want to replace the code of the AWS RDS Aurora defined in production/aws and then re-implement the same using the community module. After creating the corresponding sub-path to the monorepo according to your preference, provisioning the bucket and initializing the Terraform backend:


Implement the definition of the community module with the closest parameters from the existing one; e.g.:

module "aws_aurora_main_cluster" {
 source  = "terraform-aws-modules/rds-aurora/aws"
 version = "~> 5.2"

 # ...
Enter fullscreen mode Exit fullscreen mode


Import the Terraform states from the previous (existing) cluster

terraform import 'aws_aurora_main_cluster.aws_rds_cluster.this[0]' main-database-name
terraform import 'aws_aurora_main_cluster.aws_rds_cluster_instance.this[0]' main-database-instance-name-01
terraform import 'aws_aurora_main_cluster.aws_rds_cluster_instance.this[1]' main-database-instance-name-02

# ...
Enter fullscreen mode Exit fullscreen mode

then if you haven’t yet and would like to “match reality” between the existing and the specified resource, run terraform plan a few times and adjust the parameters until Terraform reports:

No changes. Your infrastructure matches the configuration.
Enter fullscreen mode Exit fullscreen mode


Last but not least, remove the corresponding resources from the legacy Terraform state so that it doesn’t try to keep track of the changes and also don’t try to destroy once the resource definition is no longer in that code base:

# Hypothetical name of the resource inside production/aws/
terraform state rm aws_rds_cluster.default \
    'aws_rds_cluster_instance.default[0]' 'aws_rds_cluster_instance.default[1]'

# ...

Enter fullscreen mode Exit fullscreen mode

once that is performed, feel free to remove the corresponding resource’s definition from the legacy code.

resource "aws_rds_cluster" "default" {
    # ...

resource "aws_rds_cluster_instance" "default" {
    count = var.number_of_database_instances

    # ...
Enter fullscreen mode Exit fullscreen mode

Discussion (0)