Introduction:
Managing EC2 instances at scale across multiple AWS regions can become a daunting task. Whether you're keeping track of instance details, monitoring their status, or managing tags, the process can be time-consuming. In this article, we’ll explore how you can automate the collection of EC2 instance information and its tags across multiple AWS regions, and store this data in a PostgreSQL database for easy querying and reporting. We’ll leverage Python, the boto3
library for interacting with AWS, and psycopg2
for PostgreSQL integration.
Prerequisites
To follow along with this tutorial, you will need:
-
AWS credentials configured on your machine. You can set them up using
aws configure
. -
Python 3.x installed on your machine along with the following libraries:
-
boto3
: AWS SDK for Python -
psycopg2
: PostgreSQL adapter for Python
-
You can install the required libraries using:
pip install boto3 psycopg2
Additionally, you should have a PostgreSQL database running, either locally or in the cloud.
Step 1: Setting Up AWS and PostgreSQL Connections
We start by configuring the AWS EC2 client and PostgreSQL connection parameters. The following code sets up the database details as well as the function for fetching EC2 regions.
import boto3
import psycopg2
from psycopg2 import sql
# PostgreSQL connection details
DB_HOST = "..."
DB_PORT = "5432" # default port for PostgreSQL
DB_NAME = "..."
DB_USER = "..."
DB_PASSWORD = "..."
# Function to get all EC2 regions
def get_all_regions():
ec2_client = boto3.client('ec2')
response = ec2_client.describe_regions()
regions = [region['RegionName'] for region in response['Regions']]
return regions
Here, get_all_regions()
uses the boto3
EC2 client to fetch a list of all available regions. These will be used to gather EC2 data across all regions in the subsequent steps.
Step 2: Fetching EC2 Instances and Their Tags
Once we have the list of regions, we need to gather information about EC2 instances and their tags. For this, we define the get_ec2_instances_and_tags()
function:
# Function to get EC2 instances and tags for a given region
def get_ec2_instances_and_tags(region):
ec2_client = boto3.client('ec2', region_name=region)
# Describe EC2 instances and tags
instances = ec2_client.describe_instances()
tags = ec2_client.describe_tags()
ec2_data = {}
# Process EC2 Instances
for reservation in instances['Reservations']:
for instance in reservation['Instances']:
instance_id = instance['InstanceId']
instance_name = None
instance_type = instance['InstanceType']
instance_state = instance['State']['Name']
region = instance['Placement']['AvailabilityZone'][:-1] # Removing the AZ suffix to get the region
# Try to get instance name from the tags
for tag in instance.get('Tags', []):
if tag['Key'] == 'Name':
instance_name = tag['Value']
ec2_data[instance_id] = {
'instance_id': instance_id,
'instance_name': instance_name,
'instance_type': instance_type,
'status': instance_state,
'region': region,
'tags': {}
}
# Process EC2 Tags
for tag in tags['Tags']:
instance_id = tag['ResourceId']
key = tag['Key']
value = tag['Value']
if instance_id in ec2_data:
ec2_data[instance_id]['tags'][key] = value
return ec2_data
In this function:
- We fetch the EC2 instances in a specific region and their associated tags.
- The instance details (like ID, type, and status) are stored in a dictionary.
- Tags are also gathered and added to the corresponding EC2 instance data.
Step 3: Inserting EC2 Data into PostgreSQL
Next, we set up the insert_ec2_data_into_db()
function that connects to the PostgreSQL database and inserts the EC2 data:
# Function to connect to PostgreSQL and insert data
def insert_ec2_data_into_db(ec2_data):
# Connect to your PostgreSQL server
conn = psycopg2.connect(
dbname=DB_NAME,
user=DB_USER,
password=DB_PASSWORD,
host=DB_HOST,
port=DB_PORT
)
cur = conn.cursor()
# Create tables
cur.execute("""
CREATE TABLE IF NOT EXISTS ec2_instances (
instance_id VARCHAR(50) PRIMARY KEY,
instance_name VARCHAR(100),
instance_type VARCHAR(50),
status VARCHAR(50),
region VARCHAR(50)
);
""")
cur.execute("""
CREATE TABLE IF NOT EXISTS ec2_tags (
tag_id SERIAL PRIMARY KEY,
instance_id VARCHAR(50),
tag_key VARCHAR(100),
tag_value VARCHAR(255),
FOREIGN KEY (instance_id) REFERENCES ec2_instances (instance_id) ON DELETE CASCADE
);
""")
cur.execute("""
TRUNCATE TABLE ec2_tags;
TRUNCATE TABLE ec2_instances CASCADE;
""")
# Insert EC2 instances and tags
for instance_id, instance_data in ec2_data.items():
# Insert instance data
cur.execute("""
INSERT INTO ec2_instances (instance_id, instance_name, instance_type, status, region)
VALUES (%s, %s, %s, %s, %s)
ON CONFLICT (instance_id) DO NOTHING;
""", (
instance_id,
instance_data['instance_name'],
instance_data['instance_type'],
instance_data['status'],
instance_data['region']
))
# Insert tags
for tag_key, tag_value in instance_data['tags'].items():
cur.execute("""
INSERT INTO ec2_tags (instance_id, tag_key, tag_value)
VALUES (%s, %s, %s);
""", (instance_id, tag_key, tag_value))
# Commit and close connection
conn.commit()
cur.close()
conn.close()
Here, we:
- Create tables (
ec2_instances
andec2_tags
) to store instance details and tags. - Use
TRUNCATE
to clear the existing data before inserting fresh EC2 data. - Insert EC2 instance details and their tags into the database.
Step 4: Putting It All Together
Finally, we integrate all these components in the main()
function:
# Main function
if __name__ == '__main__':
regions = get_all_regions()
ec2_data = {}
# Iterate through each region and fetch EC2 instances and tags
for region in regions:
print(f"Fetching EC2 instances from region: {region}")
region_data = get_ec2_instances_and_tags(region)
ec2_data.update(region_data) # Combine the data from each region
# Insert the fetched EC2 data into the database
insert_ec2_data_into_db(ec2_data)
print("EC2 instances and tags have been inserted into the database.")
Step 5: Running Queries
Once your EC2 data is stored in PostgreSQL, you can run SQL queries to generate reports. Here’s a simple query that joins the EC2 instances and their tags:
SELECT t1.*, t2.tag_key, t2.tag_value
FROM ec2_instances t1
JOIN ec2_tags t2 ON t1.instance_id = t2.instance_id
ORDER BY t1.instance_id, t2.tag_key, t2.tag_value;
This query will give you a detailed report of all EC2 instances along with their associated tags.
Conclusion
In this article, we demonstrated how to automate the retrieval of EC2 instances and tags across all AWS regions using Python and the boto3
library. We then stored this data in a PostgreSQL database and outlined how to run SQL queries for reporting purposes. This solution can be a game changer for managing EC2 instances at scale, especially when you need to report on or monitor large sets of resources across multiple regions.
By automating this process, you save time and ensure that your EC2 data is always up to date and easily accessible for any analysis or reporting needs.
Top comments (0)