- What is a Data Source in Terraform?
- How Data Sources Work
-
What we will cover in this project:
- Create Terraform configuration files:
- Configure Terraform AWS provider block:
- Create Data Block for Default VPC:
- Create Data Block for Default Subnet:
- Create a security group resource inside Default VPC:
- Create and store SSH key pair using terraform:
- Creating AWS key pair using our SSH public key:
- Create an EC2 Instance(Ubuntu) and Install Nginx inside Default VPC:
- Create Output Variables:
- Initialize your Terraform Configuration:
- Plan the Infrastructure Changes:
- Apply the Configuration:
- Check through AWS UI:
- Access Through Port 80 in Your Browser:
- Destroy the Infrastructure:
- Conclusion
Terraform, developed by HashiCorp, is a powerful Infrastructure as Code (IaC) tool that allows users to define, provision, and manage cloud infrastructure across multiple providers such as AWS, Azure, and Google Cloud. By using declarative configuration files, Terraform enables the automation of infrastructure, making it easily version-controlled and shared across teams. This ensures consistency, scalability, and reliability across different environments.
A key concept in Terraform is data sources, which play a vital role in enhancing the accuracy and flexibility of your infrastructure management. In this post, we'll explore what data sources are, how they work, and why they are important for building adaptive, maintainable, and efficient infrastructure.
What is a Data Source in Terraform?
A data source
in Terraform is a way to query and retrieve information about existing resources that have already been created, either by Terraform itself or by other means, such as cloud providers. Instead of creating new resources, data sources allow you to reference and incorporate external data into your Terraform configuration.
Data sources provide a powerful mechanism to:
- Integrate existing infrastructure: Leverage resources managed outside Terraform (e.g., manually provisioned databases or networks) without recreating them.
- Enhance modularity: Data sources enable Terraform modules to dynamically adapt to different environments by fetching relevant information, such as an existing Virtual Private Cloud (VPC) ID or security group rules.
- Increase efficiency: Rather than hardcoding values like resource IDs or manually looking up resource properties, data sources allow Terraform to dynamically retrieve and use the most up-to-date data.
How Data Sources Work
In Terraform, a data source is defined using the data
block. When executed, Terraform queries the specified provider for the requested data, retrieves it, and makes it available within the configuration.
Syntax of data block:
data "<PROVIDER>_<RESOURCE_TYPE>" "<NAME>" {
# Configuration arguments
}
Syntax for reference to this data block resource attribute:
data.<provider>_<resource_type>.<name>.<attribute>
These things may look overwhelming to you, so let's understand data source through a hands-on example project.
What we will cover in this project:
In this project, we’ll explore how to leverage Terraform Data Sources to retrieve and use existing AWS resources, specifically the default VPC and its associated subnets. We will create an Ubuntu EC2 instance, install Nginx, and associate the instance with the default VPC and a specific subnet using Terraform’s data sources. By doing this, you’ll gain a clear understanding of how data sources work and how they can make your Terraform configurations more dynamic and flexible.
AWS provides a default VPC and subnets in each region, which are ready to use. These resources are often sufficient for basic projects. If you don’t already have a default VPC in your AWS environment, follow these steps to create one:
- Navigate to the VPC section of the AWS Console.
- On the left-hand menu, click Your VPCs.
- Click on the Actions dropdown menu.
- Select Create Default VPC.
This will automatically create the default VPC and associated subnets, which you can then use in your Terraform configuration.
If you get stuck at any point, you can refer to the code examples and configurations in my GitHub repo for this blog: Terraform_Data_source.
Now create any directory and a file called main.tf
inside that. Add the following code snippet to this main.tf file.
Create Terraform configuration files:
In the first step, we have to tell terraform that we will be deploying infrastructure on AWS. We can do this by configuring the AWS cloud provider plugin.
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.56"
}
}
}
This configuration tells Terraform to use the AWS provider and ensures compatibility with Terraform version 1.0 or higher. The provider version is locked to maintain stability and prevent unexpected updates.
Configure Terraform AWS provider block:
The next step is to configure the AWS provider block, which accepts various config parameters. We will start by specifying the region to deploy the infra in, us-east-1
.
provider "aws" {
region = "us-east-1"
}
Create Data Block for Default VPC:
data "aws_vpc" "default" {
default = true
}
By setting default = true
, Terraform automatically fetches the default VPC in the current region, making it easier to launch resources without manually specifying VPC details. This ensures flexibility and reduces the chance of errors when managing infrastructure across different environments.
Create Data Block for Default Subnet:
The following Terraform code retrieves details of an existing subnet in AWS:
data "aws_subnet" "default" {
vpc_id = data.aws_vpc.default.id
filter {
name = "availability-zone"
values = ["us-east-1a"]
}
}
Explanation:
-
Data Source: The code uses the
aws_subnet
data source to fetch details about an existing subnet in AWS. -
VPC Association: The
vpc_id
attribute ties the subnet to a specific VPC usingdata.aws_vpc.default.id
This ensures that the retrieved subnet is part of the correct VPC, maintaining necessary network isolation and connectivity for your resources. -
Availability Zone Filter: Filters to retrieve subnets specifically in the
us-east-1a
availability zone, optimizing resource placement.
Create a security group resource inside Default VPC:
resource "aws_security_group" "allow_ssh_http_https" {
vpc_id = data.aws_vpc.default.id
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "allow-ssh-http"
}
}
Explanation:
Security Group Creation: The code defines an AWS security group named
allow_ssh_http_https
, linked to a default VPC usingvpc_id = data.aws_vpc.default.id
, ensuring it operates within the intended network.Ingress Rules: It includes two ingress rules: one allowing SSH on port 22 and another for HTTP on port 80, both accessible from all IP addresses (
0.0.0.0/0
), enabling secure connections and web traffic.Egress Rule: The egress rule allows all outbound traffic with the protocol set to
-1
and0.0.0.0/0
as the CIDR block, permitting resources to communicate freely with external networks.
Create and store SSH key pair using terraform:
To enable secure access to our AWS resources, we'll generate an SSH key pair. This key pair will be used for accessing instances securely:
resource "tls_private_key" "ssh_key" {
algorithm = "RSA"
rsa_bits = 4096
}
resource "local_file" "private_key" {
content = tls_private_key.ssh_key.private_key_pem
filename = "./.ssh/terraform_rsa"
}
resource "local_file" "public_key" {
content = tls_private_key.ssh_key.public_key_openssh
filename = "./.ssh/terraform_rsa.pub"
}
This configuration generates an RSA key pair with a 4096-bit key length. The private and public keys are then saved to files in the .ssh directory, ready for use in connecting to our AWS instances.
Creating AWS key pair using our SSH public key:
Next, we'll create an AWS key pair using the public SSH key we generated:
resource "aws_key_pair" "deployer" {
key_name = "ubuntu_ssh_key"
public_key = tls_private_key.ssh_key.public_key_openssh
}
This resource uploads the public key to AWS, allowing you to securely access your EC2 instances using the corresponding private key.
Create an EC2 Instance(Ubuntu) and Install Nginx inside Default VPC:
To provision an EC2 instance with the required configuration, we can define the following resource in Terraform:
resource "aws_instance" "ubuntu_instance" {
ami = "ami-0a0e5d9c7acc336f1"
instance_type = "t2.micro"
subnet_id = data.aws_subnet.default.id
vpc_security_group_ids = [aws_security_group.allow_ssh_http_https.id]
key_name = aws_key_pair.deployer.key_name
associate_public_ip_address = true
user_data = <<-EOF
#!/bin/bash
sudo apt update -y
sudo apt install -y nginx
echo "<h1>Hello From Ubuntu EC2 Instance!!!</h1>" | sudo tee /var/www/html/index.html
sudo systemctl restart nginx
EOF
tags = {
Name = "ubuntu-instance"
}
}
Explanation:
-
Instance Configuration:
-
AMI and Type: The instance uses a specified Amazon Machine Image (AMI) for Ubuntu (
ami-0a0e5d9c7acc336f1
) and is of typet2.micro
, making it suitable for low-cost and lightweight applications.
-
AMI and Type: The instance uses a specified Amazon Machine Image (AMI) for Ubuntu (
-
Networking Setup:
-
Subnet and Security Group: The instance is launched in a subnet identified by
data.aws_subnet.default.id
, and it’s associated with the previously defined security group (aws_security_group.allow_ssh_http_https.id
), ensuring proper network access.
-
Subnet and Security Group: The instance is launched in a subnet identified by
-
Key Pair and Public IP:
-
SSH Access: It uses a specified key pair (
aws_key_pair.deployer.key_name
) for SSH access and is configured to have a public IP address, allowing external connections.
-
SSH Access: It uses a specified key pair (
-
User Data Script:
-
Initial Setup: The
user_data
block contains a Bash script that runs on instance startup. It updates the package manager, installs NGINX, creates a simple HTML file in the web server directory, and restarts NGINX to display a greeting message.
-
Initial Setup: The
-
Resource Tagging:
-
Instance Naming: The instance is tagged with the name
ubuntu-instance
, which helps in identifying the resource in the AWS management console.
-
Instance Naming: The instance is tagged with the name
Create Output Variables:
The following Terraform code defines outputs to display important information after provisioning resources:
# Output the Public IPs
output "ubuntu_instance_public_ip" {
value = aws_instance.ubuntu_instance.public_ip
}
# Output VPC CIDR Block
output "vpc_cidr_block" {
value = data.aws_vpc.default.cidr_block
description = "The CIDR block of the default VPC"
}
# Output Subnet CIDR Block
output "subnet_cidr_block" {
value = data.aws_subnet.default.cidr_block
description = "The CIDR block of the default subnet"
}
Key Points:
-
Public IP of the EC2 Instance:
- The first output block,
ubuntu_instance_public_ip
, captures the public IP address of the newly created EC2 instance. This information is crucial for accessing the instance over the internet.
- The first output block,
-
VPC CIDR Block:
- The second output,
vpc_cidr_block
, retrieves and displays the CIDR block of the default VPC. This block helps in understanding the IP address range used by the VPC, facilitating network management.
- The second output,
-
Subnet CIDR Block:
- The third output,
subnet_cidr_block
, provides the CIDR block of the default subnet. Knowing this information is important for managing subnets and ensuring that resources are correctly addressed within the network.
- The third output,
Initialize your Terraform Configuration:
Once you have your main.tf, the next step is to initialize your Terraform environment. Run the following command in your project directory:
terraform init
This command initializes the working directory containing your Terraform configuration files. It downloads the necessary provider plugins, sets up the backend, and prepares the environment for future Terraform operations.
Plan the Infrastructure Changes:
To preview the actions Terraform will take without actually applying any changes, run:
terraform plan
This command generates an execution plan, which shows you what resources will be created, modified, or destroyed. It helps you ensure that everything is configured correctly before applying any changes.
Apply the Configuration:
Once we verify the planned changes. We will deploy the changes to AWS via the terraform apply
command.
Confirm the changes by typing “yes”.
Awesome! You just created your Ubuntu EC2 instance via Terraform.
Check through AWS UI:
Navigate to the AWS Management Console to verify your instance and other resources. You can now view the public IP address and other details directly in the console.
Access Through Port 80 in Your Browser:
Open your web browser and enter http://your-public-ip:80 in the address bar, replacing your-public-ip with the EC2 instance's public IP. You should see your "Hello From Ubuntu EC2 Instance!!!" message.
Destroy the Infrastructure:
If you want to tear down the infrastructure you created, use the terraform destroy
command
Confirm the changes by typing “yes”.
Delete all the resources defined in your configuration, ensuring a clean removal of everything Terraform created.
Conclusion
In this project, we used Terraform to provision an Ubuntu EC2 instance in AWS’s default VPC, demonstrating the value of data sources. By retrieving existing resources, we created a flexible infrastructure setup. We configured the instance, installed Nginx, and learned to output important resource information. This exercise underscores Terraform's efficiency in automating cloud infrastructure management.
Stay tuned for our upcoming blogs where we’ll dive deeper into advanced Terraform concepts, including modules, remote backends, state management, and more. We’ll explore how these features can further enhance your infrastructure management and automation practices.
Top comments (2)
Thanks for sharing with the community, keep up the good work
Thank you! I really appreciate your kind words. I'm glad to contribute to the community and will keep working hard!