Infrastructure as Code (IaC) has transformed how you deploy and manage cloud infrastructure. Tools like Azure Resource Manager, AWS CloudFormation, Docker, Kubernetes, Ansible, and Terraform have made deploying infrastructure faster and more scalable. However, they’ve also introduced a new set of security challenges.
In recent years, there have been numerous security incidents caused by IaC misconfigurations. These include:
- Open Ports and Weak Security Groups: Exposing cloud resources to the public internet.
- Hardcoded Secrets: Credentials, API keys, or sensitive data embedded in code.
- Overly Permissive Policies: Granting unnecessary privileges to users or services.
- Missing Resource Limits: Allowing unrestricted use of cloud resources, leading to potential outages or abuse.
The consequences of these issues can be severe, from data breaches to financial loss and reputational damage. Moreover when Code GenAI is used in order to produce those artifacts.
Code GenAI is a great help to start code artifacts and produce boilerplate code, but it also needs to be reviewed to avoid the introduction of unexpected issues and vulnerabilities.
Fortunately, there are tools that can help identify critical vulnerabilities early in development.
From the SonarQube Cloud telemetry, I've gathered the most hit issues regarding IaC, with more than 6 million hits in total across all projects analyzed.
In this article, I focus on Azure, CloudFormation, Docker, Kubernetes, Ansible and Terraform as examples of IaC issues. I highlight each critical issue, its risks, and how to fix it.
As a bonus chapter, you can see the result of an experiment with Code GenAI using different providers to generate Kubernetes artifacts and check if they are as clean and secure as we expect.
Let's start with a list of examples of all the IaC artifacts covered in this article (and supported by SonarQube).
Azure Resource Manager
Restrict Public Access to Resources
Problem: Allowing unrestricted public access to Azure resources (e.g., Blob Storage) exposes them to unauthorized users.
Solution: Using publicNetworkAccess
to control access to resources.
{
"type": "Microsoft.Web/sites",
"apiVersion": "2020-12-01",
"name": "example-site",
"properties": {
"siteConfig": {
"publicNetworkAccess": "Disabled"
}
}
}
AWS CloudFormation
1. Ensure S3 Buckets Are Private
Problem: Publicly accessible S3 buckets can lead to data leaks.
Solution: Set the bucket's AccessControl
to Private
.
Resources:
MyBucket:
Type: AWS::S3::Bucket
Properties:
AccessControl: Private
2. Apply Least Privilege to IAM Roles
Problem: Granting broad permissions creates unnecessary security risks.
Solution: Limit actions and resources to only what’s required.
Resources:
# Update Lambda code
lambdaUpdatePolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
ManagedPolicyName: lambdaUpdatePolicy
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- lambda:UpdateFunctionCode
Resource: "arn:aws:lambda:us-east-2:123456789012:function:my-function:1"
Docker
Avoid Running Containers as Root
Problem: Running containers as root
increases the risk of privilege escalation.
Solution: Create and use a non-root user in your Dockerfile.
FROM alpine
RUN addgroup -S nonroot \
&& adduser -S nonroot -G nonroot
USER nonroot
ENTRYPOINT ["id"]
Kubernetes
1. Don’t Run Privileged Pods
Problem: Running containers in privileged mode can reduce the resilience of a cluster in the event of a security incident because it weakens the isolation between hosts and containers.
Solution: Disable privileged mode in your pod specification.
apiVersion: v1
kind: Pod
metadata:
name: example
spec:
containers:
- name: web
image: nginx
ports:
- name: web
containerPort: 80
protocol: TCP
securityContext:
privileged: false
2. Define Resource Requests and Limits
Problem: Allowing pods to use unlimited resources can destabilize the cluster.
Solution: Specify resource requests and limits for containers.
apiVersion: v1
kind: Pod
metadata:
name: resource-limited-pod
spec:
containers:
- name: app
image: myapp:latest
resources:
requests:
memory: "256Mi"
cpu: "0.5"
limits:
memory: "512Mi"
cpu: "1"
3. Specific version tag for image should be used
Problem: When a container image is not tagged with a specific version, it is referred to as latest. This means that every time the image is built, deployed, or run, it will always use the latest version of the image.
Solution: To avoid these issues, it is recommended to use specific version tags for container images.
apiVersion: v1
kind: Pod
metadata:
name: example
spec:
containers:
- name: nginx
image: nginx:1.14.2
- name: nginx
image: nginx@sha256:b0ad43f7ee5edbc0effbc14645ae7055e21bc1973aee5150745632a24a752661
Terraform
Allowing public network access to cloud resources is security-sensitive
Problem: Enabling public network access to cloud resources can affect an organization’s ability to protect its data or internal operations from data theft or disruption.
Solution: Use private networks and VPC peering or other secure communication tunnels to communicate with other cloud components.
resource "google_compute_instance" "example" {
network_interface {
network = google_compute_network.vpc_network_example.name
}
}
Ansible
1. Server certificates should be verified
Problem: This vulnerability makes it possible for encrypted communication to be intercepted.
Solution: Ensure playbooks do not bypass certificate validation.
- name: Example playbook
hosts: server
tasks:
- name: Retrieve a web page
ansible.builtin.uri:
url: https://www.example.com
validate_certs: true
return_content: true
2. Loose POSIX permissions
Problem: Files with overly permissive POSIX permissions (e.g., 777) grant unnecessary read, write, or execute access to unauthorized users.
Solution: Ensure playbooks explicitly set restrictive permissions on sensitive files.
- name: My deployment
hosts: all
tasks:
- name: Create /etc/demo with permissions
ansible.builtin.file:
path: /etc/demo
state: directory
mode: '0770'
- name: Copy demo3.conf and set symbolic permissions
ansible.builtin.copy:
src: /files/demo.conf
dest: /etc/demo/demo.conf
mode: 'g=r,u+w,o='
Secrets management
Keys and passwords being disclosed
Problem: Secret leaks often occur when sensitive authentication data is stored with an application's source code. This is one example (OVH keys, the most hit issue) but the issue extends to several platforms.
props.set("ovh-key", "EXEgWIz07P0HYwtQDs7cNIqCiQaWSuHF")
Solution:
props.set("ovh-key", System.getenv("OVH_KEY"))
Experimenting with Code GenAI and IaC
In order to see how accurate is Code GenAI for IaC artifacts, I've done an experiment with an AI application made in Java using Qarkus. You can find the source code here.
Using my IDE I've asked two different code assistants to produce the Kubernetes Deployment yaml file to deploy the application.
I've used Github Copilot and Amazon Q as code assistants.
Let's see the results.
Github Copilot
apiVersion: apps/v1
kind: Deployment
metadata:
name: pingpong-ai-deployment
labels:
app: pingpong-ai
spec:
replicas: 3
selector:
matchLabels:
app: pingpong-ai
template:
metadata:
labels:
app: pingpong-ai
spec:
containers:
- name: pingpong-ai-container
image: your-docker-repo/pingpong-ai:latest
ports:
- containerPort: 8080
env:
- name: QUARKUS_PROFILE
value: prod
- name: JAVA_OPTS
value: "-Dquarkus.http.host=0.0.0.0"
volumeMounts:
- name: config-volume
mountPath: /deployments/config
volumes:
- name: config-volume
configMap:
name: pingpong-ai-config
---
apiVersion: v1
kind: Service
metadata:
name: pingpong-ai-service
spec:
selector:
app: pingpong-ai
ports:
- protocol: TCP
port: 80
targetPort: 8080
type: LoadBalancer
It has been smart enough to generate a Service and a volume. But, not all is shiny. SonarQube for IDE has detected 6 issues :
Amazon Q
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-application
labels:
app: my-application
spec:
replicas: 3 # Number of pod replicas you want to run
selector:
matchLabels:
app: my-application
template:
metadata:
labels:
app: my-application
spec:
containers:
- name: my-application
image: your-registry/your-image:tag # Replace with your image
ports:
- containerPort: 8080 # Replace with your application port
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
env:
- name: ENVIRONMENT
value: "production"
# Add any environment variables your application needs
livenessProbe:
httpGet:
path: /health # Replace with your health check endpoint
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health # Replace with your health check endpoint
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
This output is way cleaner and of higher quality than the previous one, and SonarQube for IDE has detected only 2 issues. However, it doesn't provide a Service or a volume for configuration files.
In both cases, it's been fast and without the problem of typing errors (wrong attribute name, wrong number of spaces, tabs, etc..)., but as with any code generated by AI, it needed a review phase to detect issues that are not always obvious in order to submit good-quality code to our repository and lead to smooth Pull Request reviews.
Conclusion
IaC enables teams to automate and scale infrastructure efficiently, but with great power comes great responsibility. Misconfigurations, hardcoded secrets, and overly permissive access controls are common mistakes that can lead to serious security vulnerabilities.
By following best practices and leveraging tools like SonarQube, developers can identify and resolve critical security issues early in the development process.
More than just security, maintaining code quality in IaC is essential. Well-structured, maintainable IaC ensures teams can quickly adapt to new requirements and maintain a robust, secure infrastructure.
Combining high-quality code with automated tooling is the key to avoiding costly security mishaps. SonarQube has rules to check all these issues in Azure Resource Manager, Docker, Kubernetes, CloudFormation, Terraform, Ansible and Secrets in general.
Top comments (0)