Terraform allows you to set up infrastructure across a wide variety of vendors and platforms. I love it because it’s declarative—meaning you just say what it is your want and Terraform figures out how to get there. There are many good introductions to Terraform out there, but they lack that real-word-project feel. What’s it like to use Terraform in anger? I want to explore that with a real project example built from the ground up.
To that end, one need I frequently come across in my freelance data science work is secure data transfer. Often clients need to send sensitive, possibly medical or top-secret, data. In this series, we’ll deliver a really slick experience for three different types of users on AWS. With the first being secure upload to S3 via the AWS CLI. Later installments will look at giving the client’s Business and IT users access via AWS Console (browser) and SFTP, respectively.
Defining The Problem
Clearly we can’t just create a public bucket because this data is sensitive. Equally, we can’t give the client access to our entire platform. Though challenging, locking the client down to a secure bucket without access to anything else is the only feasible option for a real project. And we’ll need to do this while delivering pleasant user experience.
Although we should create our own browser portal for data transfer, it's a lot of work. It would be fine for me to invest that time for my own business, but I’d think it negligent charging a client for something so bespoke unless core to their business. I am about delivering value for money, not building cathedrals.
Download The Complete Series Example Code
The Plan
- First, we’ll create a bucket that doesn’t expose any client information
- Then we’ll create a user for the client that has access to the AWS via an Access key
- After that, we need to define a policy that limits a user to only accessing S3 and only that bucket
- And finally, we’ll associate that policy to the user
Now, this is still a toy example because, on a real project, you would probably need to support several clients—each with many users. We’ll address that in part II of this tutorial, but keep it simple for now. Later we can refactor using some more advanced features of Terraform to enable multiple users. This may seem forced, but I always recommend programming iteratively: starting with something simple that works and building out the functionality through refactoring.
Installing Terraform
Being written in Go, Terraform, consequently, installs easily. Simply fetch it from downloads (for your system), unzip it, and move it to a directory included in your system’s PATH
. Finally, check the version you’re are using. Here’s mine for reference.
dataunbound$ terraform --versionTerraform v0.12.24
From here on, I’m assuming you have an AWS account, an access key, and awscli
working on your machine. So if you haven’t done that, you should do it now.
Terraform HELLO WORLD
To kick things off, we’ll define AWS as a provider and nothing more.
This block states that we’ll be using AWS and want eu-west-2
to be our default region; provider
is a keyword. Nothing could be simpler.
Now let’s “apply”. What could go wrong?!
dataunbound$ terraform apply
Error: Could not satisfy plugin requirements
Plugin reinitialization required. Please run "terraform init".
Plugins are external binaries that Terraform uses to access and manipulate
resources. The configuration provided requires plugins which can\'t be located,
don\'t satisfy the version constraints, or are otherwise incompatible.
Terraform automatically discovers provider requirements from your
configuration, including providers used in child modules. To see the
requirements and constraints from each module, run "terraform providers".
Error: provider.aws: no suitable version installed
version requirements: "(any version)"
versions installed: none
Fantastic! An asinine message that contains its own solution—i.e. initialize the project with terraform init
first.
Luckily, I use thefuck and so should you. It corrects simple mistakes from the previous console command—unnecessary, but fun. Have a look:
dataunbound$ fuck
terraform init && terraform apply [enter/↑/↓/ctrl+c]
Initializing the backend...
Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "aws" (hashicorp/aws) 2.60.0...
The following providers do not have any version constraints in configuration,
so the latest version was installed.
To prevent automatic upgrades to new major versions that may contain breaking
changes, it is recommended to add version = "..." constraints to the
corresponding provider blocks in configuration, with the constraint strings
suggested below.
* provider.aws: version = "~> 2.60"
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
So what happened here? Well, first thefuck
ran terraform init
and discovered we’re using AWS as a provider. Consequently, it installed the necessary dependencies (to our local Terraform). Then terraform apply
looked for what need to be done and found nothing, hence:
**Resources: 0 added, 0 changed, 0 destroyed**
That’s because we have declared any resources. Let’s change that by declaring a bucket for our client’s data.
Creating a Secure Bucket in S3
This is our first resource, aws_s3_bucket
, and we named it test_client_bucket
. When we apply, Terraform creates a private s3 bucket name “test-client-bucket-x130099”.
dataunbound$ terraform apply
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# aws_s3_bucket.test_client_bucket will be created
+ resource "aws_s3_bucket" "test_client_bucket" {
+ acceleration_status = (known after apply)
+ acl = "private"
+ arn = (known after apply)
+ bucket = "test-client-bucket-x130099"
+ bucket_domain_name = (known after apply)
+ bucket_regional_domain_name = (known after apply)
+ force_destroy = false
+ hosted_zone_id = (known after apply)
+ id = (known after apply)
+ region = "eu-west-2"
+ request_payer = (known after apply)
+ website_domain = (known after apply)
+ website_endpoint = (known after apply)
+ versioning {
+ enabled = (known after apply)
+ mfa_delete = (known after apply)
}
}
Plan: 1 to add, 0 to change, 0 to destroy.
This first bit above is the plan. Notice that there are several attributes marked as (known after apply)
. We’ll discuss this in a minute.
But first, we should ensure everything in this bucket is encrypted server-side. We’ll use AES256
like so:
If you apply
now, you’ll see that Terraform only changes the existing bucket rather than destroying and recreating it.
**Plan:** 0 to add, 1 to change, 0 to destroy.
But this won’t always be the case. When unsure, use terraform plan
to see a dry run.
As an admin, you can explore this new bucket. When your done, we’ll create a user whose sole ability is to view this bucket and administer its contents.
Creating a Restricted S3 bucket User
We need to create a user and, moreover, restrict their knowledge and control to this bucket. There are many ways to do this; but for now, we’ll use an aws_aim_user
and an aws_iam_user_policy
with a separate aws_iam_policy_document
.
The User
Nothing complex here—just a name, Alice.
But to access AWS via the cli, the user will need an access key.
Now we finally encounter something interesting. By
aws_iam_user.test_client.name
, we are asking for the value of the name
attribute on whatever gets created by the resource "aws_iam_access_key" "test_client"
block. It's invaluable to note that the Terraform documentation (which I’ve been linking to as we go along) lists the attributes for each resource.dataunbound$ terraform apply
aws_s3_bucket.test_client_bucket: Refreshing state... [id=test-client-bucket-x130099]
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# aws_iam_access_key.test_client will be created
+ resource "aws_iam_access_key" "test_client" {
+ encrypted_secret = (known after apply)
+ id = (known after apply)
+ key_fingerprint = (known after apply)
+ secret = (sensitive value)
+ ses_smtp_password = (sensitive value)
+ ses_smtp_password_v4 = (sensitive value)
+ status = (known after apply)
+ user = "alice"
}
# aws_iam_user.test_client will be created
+ resource "aws_iam_user" "test_client" {
+ arn = (known after apply)
+ force_destroy = false
+ id = (known after apply)
+ name = "alice"
+ path = "/"
+ unique_id = (known after apply)
}
Plan: 2 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
aws_iam_user.test_client: Creating...
aws_iam_user.test_client: Creation complete after 1s [id=alice]
aws_iam_access_key.test_client: Creating...
aws_iam_access_key.test_client: Creation complete after 1s [id=AKIA6MFJDG3VVFMKP4EF]
Even cooler, when Terraform sees this reference, it will flag it as a dependency. When we run apply
, resources will be created in the required order, and in parallel where possible. Here Terraform figured out it was just a literal which is why "alice"
was included in the plan for the access key.
So we could have used the literal "alice"
ourselves, but then we would be repeating ourselves—which is never good. But professional standards aside, you don’t always know the value of the property before it is created. Remember all the attributes in the output labeled (known after apply)
? Imagine, for instance, we needed the ARN rather than the name.
What are the ARNS for the resources we’ve made anyway? Well, after apply
, you can see all current values with terraform show
.
dataunbound$ terraform show
# aws_iam_access_key.test_client:
resource "aws_iam_access_key" "test_client" {
id = "AKIA6MFJDG3VVFMKP4EF"
secret = (sensitive value)
ses_smtp_password = (sensitive value)
ses_smtp_password_v4 = (sensitive value)
status = "Active"
user = "alice"
}
# aws_iam_user.test_client:
resource "aws_iam_user" "test_client" {
arn = "arn:aws:iam::988197107435:user/alice"
force_destroy = false
id = "alice"
name = "alice"
path = "/"
unique_id = "AIDA6MFJDG3VRPQBSAB4R"
}
# aws_s3_bucket.test_client_bucket:
resource "aws_s3_bucket" "test_client_bucket" {
acl = "private"
arn = "arn:aws:s3:::test-client-bucket-x130099"
bucket = "test-client-bucket-x130099"
bucket_domain_name = "test-client-bucket-x130099.s3.amazonaws.com"
bucket_regional_domain_name = "test-client-bucket-x130099.s3.eu-west-2.amazonaws.com"
force_destroy = false
hosted_zone_id = "Z3GKZC51ZF0DB4"
id = "test-client-bucket-x130099"
region = "eu-west-2"
request_payer = "BucketOwner"
tags = {}
server_side_encryption_configuration {
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
versioning {
enabled = false
mfa_delete = false
}
}
Neat! It’s worth noting here that Terraform does not output-sensitive values to the screen; the (sensitive value)
sections are not edits on my part.
But where did the access key go? And when you find it, how are you going to get it to the user? We’ll cover this in more detail in a later segment, but suffice to say look for a file in your working directory called terraform.tfstate
. You can find it there.
Now let’s give Alice some access.
The Policy
Policies in AWS are defined as JSON. Most tutorials inline the JSON to define such policies. However, this leads to hard-coding identifiers. These can change when a resource is recreated and cause problems. I don’t know how the other authors sleep at night, but I’ll break from the norm and provide a civilized example. I’ll use a separate aws_iam_policy_document
.
This policy allows access to the contents of aws_s3_bucket.test_client_bucket.arn
. This policy document is not a resource like our other blocks. Instead, it is a data source and will provide reusability and some protection against fat-finger mistakes.
With it, we’ll create a user policy like so.
The .json
converts our data source to literal JSON. Now let's take it for a spin.
Testing
After apply
, you can test your new user. I did this by adding the access key information from terraform.tfstate
to my ~/.aws/credentials
file in a new section [alice]
. You can then call awscli --profile alice
followed by any of your usual commands.
We may look at automated testing in a future post, but not here.
Next Steps with Terraform
Stay tuned by subscribing and you’ll get notified immediately when part II is available. In Part II we will refactor aws-hello-world.tf
into separate files and modules and adding optional aws console access.
Download The Complete Series Example Code
The post Terraform in Anger Part 1: AWS S3 Access appeared first on Data Unbound.
Top comments (0)