Jonathan Vila

Posted on Dec 20

Code Quality in the Cloud

#cloudcomputing #terraform #kubernetes #security

Infrastructure as Code (IaC) has transformed how you deploy and manage cloud infrastructure. Tools like Azure Resource Manager, AWS CloudFormation, Docker, Kubernetes, Ansible, and Terraform have made deploying infrastructure faster and more scalable. However, they’ve also introduced a new set of security challenges.

In recent years, there have been numerous security incidents caused by IaC misconfigurations. These include:

Open Ports and Weak Security Groups: Exposing cloud resources to the public internet.
Hardcoded Secrets: Credentials, API keys, or sensitive data embedded in code.
Overly Permissive Policies: Granting unnecessary privileges to users or services.
Missing Resource Limits: Allowing unrestricted use of cloud resources, leading to potential outages or abuse.

The consequences of these issues can be severe, from data breaches to financial loss and reputational damage. Moreover when Code GenAI is used in order to produce those artifacts.

Code GenAI is a great help to start code artifacts and produce boilerplate code, but it also needs to be reviewed to avoid the introduction of unexpected issues and vulnerabilities.

Fortunately, there are tools that can help identify critical vulnerabilities early in development.

From the SonarQube Cloud telemetry, I've gathered the most hit issues regarding IaC, with more than 6 million hits in total across all projects analyzed.

In this article, I focus on Azure, CloudFormation, Docker, Kubernetes, Ansible and Terraform as examples of IaC issues. I highlight each critical issue, its risks, and how to fix it.

As a bonus chapter, you can see the result of an experiment with Code GenAI using different providers to generate Kubernetes artifacts and check if they are as clean and secure as we expect.

Let's start with a list of examples of all the IaC artifacts covered in this article (and supported by SonarQube).

Azure Resource Manager

Restrict Public Access to Resources

Problem: Allowing unrestricted public access to Azure resources (e.g., Blob Storage) exposes them to unauthorized users.

Solution: Using publicNetworkAccess to control access to resources.

{
  "type": "Microsoft.Web/sites",
  "apiVersion": "2020-12-01",
  "name": "example-site",
  "properties": {
     "siteConfig": {
        "publicNetworkAccess": "Disabled"
     }
  }
}

AWS CloudFormation

1. Ensure S3 Buckets Are Private

Problem: Publicly accessible S3 buckets can lead to data leaks.

Solution: Set the bucket's AccessControl to Private.

Resources:
  MyBucket:
    Type: AWS::S3::Bucket
    Properties:
      AccessControl: Private

2. Apply Least Privilege to IAM Roles

Problem: Granting broad permissions creates unnecessary security risks.

Solution: Limit actions and resources to only what’s required.

Resources:
  # Update Lambda code
  lambdaUpdatePolicy:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      ManagedPolicyName: lambdaUpdatePolicy
      PolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Action:
              - lambda:UpdateFunctionCode
            Resource: "arn:aws:lambda:us-east-2:123456789012:function:my-function:1"

Docker

Avoid Running Containers as Root

Problem: Running containers as root increases the risk of privilege escalation.

Solution: Create and use a non-root user in your Dockerfile.

FROM alpine

RUN addgroup -S nonroot \
    && adduser -S nonroot -G nonroot

USER nonroot

ENTRYPOINT ["id"]

Kubernetes

1. Don’t Run Privileged Pods

Problem: Running containers in privileged mode can reduce the resilience of a cluster in the event of a security incident because it weakens the isolation between hosts and containers.

Solution: Disable privileged mode in your pod specification.

apiVersion: v1
kind: Pod
metadata:
  name: example
spec:
  containers:
    - name: web
      image: nginx
      ports:
        - name: web
          containerPort: 80
          protocol: TCP
      securityContext:
        privileged: false

2. Define Resource Requests and Limits

Problem: Allowing pods to use unlimited resources can destabilize the cluster.

Solution: Specify resource requests and limits for containers.

apiVersion: v1
kind: Pod
metadata:
  name: resource-limited-pod
spec:
  containers:
    - name: app
      image: myapp:latest
      resources:
        requests:
          memory: "256Mi"
          cpu: "0.5"
        limits:
          memory: "512Mi"
          cpu: "1"

3. Specific version tag for image should be used

Problem: When a container image is not tagged with a specific version, it is referred to as latest. This means that every time the image is built, deployed, or run, it will always use the latest version of the image.

Solution: To avoid these issues, it is recommended to use specific version tags for container images.

apiVersion: v1
kind: Pod
metadata:
  name: example
spec:
  containers:
    - name: nginx
      image: nginx:1.14.2 

    - name: nginx
      image: nginx@sha256:b0ad43f7ee5edbc0effbc14645ae7055e21bc1973aee5150745632a24a752661

Terraform

Allowing public network access to cloud resources is security-sensitive

Problem: Enabling public network access to cloud resources can affect an organization’s ability to protect its data or internal operations from data theft or disruption.

Solution: Use private networks and VPC peering or other secure communication tunnels to communicate with other cloud components.

resource "google_compute_instance" "example" {
  network_interface {
    network = google_compute_network.vpc_network_example.name
  }
}

Ansible

1. Server certificates should be verified

Problem: This vulnerability makes it possible for encrypted communication to be intercepted.
Solution: Ensure playbooks do not bypass certificate validation.

- name: Example playbook
  hosts: server
  tasks:
    - name: Retrieve a web page
      ansible.builtin.uri:
        url: https://www.example.com
        validate_certs: true
        return_content: true

2. Loose POSIX permissions

Problem: Files with overly permissive POSIX permissions (e.g., 777) grant unnecessary read, write, or execute access to unauthorized users.
Solution: Ensure playbooks explicitly set restrictive permissions on sensitive files.

- name: My deployment
  hosts: all
  tasks:
    - name: Create /etc/demo with permissions
      ansible.builtin.file:
        path: /etc/demo
        state: directory
        mode: '0770'

    - name: Copy demo3.conf and set symbolic permissions
      ansible.builtin.copy:
        src: /files/demo.conf
        dest: /etc/demo/demo.conf
        mode: 'g=r,u+w,o='

Secrets management

Keys and passwords being disclosed

Problem: Secret leaks often occur when sensitive authentication data is stored with an application's source code. This is one example (OVH keys, the most hit issue) but the issue extends to several platforms.

props.set("ovh-key", "EXEgWIz07P0HYwtQDs7cNIqCiQaWSuHF")

Solution:

props.set("ovh-key", System.getenv("OVH_KEY"))

Experimenting with Code GenAI and IaC

In order to see how accurate is Code GenAI for IaC artifacts, I've done an experiment with an AI application made in Java using Qarkus. You can find the source code here.

Using my IDE I've asked two different code assistants to produce the Kubernetes Deployment yaml file to deploy the application.

I've used Github Copilot and Amazon Q as code assistants.
Let's see the results.

Github Copilot

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pingpong-ai-deployment
  labels:
    app: pingpong-ai
spec:
  replicas: 3
  selector:
    matchLabels:
      app: pingpong-ai
  template:
    metadata:
      labels:
        app: pingpong-ai
    spec:
      containers:
        - name: pingpong-ai-container
          image: your-docker-repo/pingpong-ai:latest
          ports:
            - containerPort: 8080
          env:
            - name: QUARKUS_PROFILE
              value: prod
            - name: JAVA_OPTS
              value: "-Dquarkus.http.host=0.0.0.0"
          volumeMounts:
            - name: config-volume
              mountPath: /deployments/config
      volumes:
        - name: config-volume
          configMap:
            name: pingpong-ai-config
---
apiVersion: v1
kind: Service
metadata:
  name: pingpong-ai-service
spec:
  selector:
    app: pingpong-ai
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  type: LoadBalancer

It has been smart enough to generate a Service and a volume. But, not all is shiny. SonarQube for IDE has detected 6 issues :

Amazon Q

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-application
  labels:
    app: my-application
spec:
  replicas: 3  # Number of pod replicas you want to run
  selector:
    matchLabels:
      app: my-application
  template:
    metadata:
      labels:
        app: my-application
    spec:
      containers:
        - name: my-application
          image: your-registry/your-image:tag  # Replace with your image
          ports:
            - containerPort: 8080  # Replace with your application port
          resources:
            requests:
              cpu: "200m"
              memory: "256Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"
          env:
            - name: ENVIRONMENT
              value: "production"
          # Add any environment variables your application needs
          livenessProbe:
            httpGet:
              path: /health  # Replace with your health check endpoint
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health  # Replace with your health check endpoint
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10

This output is way cleaner and of higher quality than the previous one, and SonarQube for IDE has detected only 2 issues. However, it doesn't provide a Service or a volume for configuration files.

In both cases, it's been fast and without the problem of typing errors (wrong attribute name, wrong number of spaces, tabs, etc..)., but as with any code generated by AI, it needed a review phase to detect issues that are not always obvious in order to submit good-quality code to our repository and lead to smooth Pull Request reviews.

Conclusion

IaC enables teams to automate and scale infrastructure efficiently, but with great power comes great responsibility. Misconfigurations, hardcoded secrets, and overly permissive access controls are common mistakes that can lead to serious security vulnerabilities.

By following best practices and leveraging tools like SonarQube, developers can identify and resolve critical security issues early in the development process.

More than just security, maintaining code quality in IaC is essential. Well-structured, maintainable IaC ensures teams can quickly adapt to new requirements and maintain a robust, secure infrastructure.

Combining high-quality code with automated tooling is the key to avoiding costly security mishaps. SonarQube has rules to check all these issues in Azure Resource Manager, Docker, Kubernetes, CloudFormation, Terraform, Ansible and Secrets in general.

DEV Community