Chabane R. for Onepoint x Stack Labs

Posted on Apr 4, 2021 • Edited on Jun 15, 2021

Automating Management of Google Compute Engine VM fleets at scale

#googlecloud #security #compliance #terraform

If you have hundreds of virtual machines deployed in your environment, you already know how difficult it is to manage operating system versions and patches, align software package versions between teams, hardening an image, etc.

In Google Cloud, there are many tools and best practices to help you on managing virtual machines fleet in your GCP organisation, but how to implement them? [1]

In this long post we will see how we can manage virtual machines hosted on Google's infrastructure at organization level.

We start by defining an image family.
Baking a custom image using Packer and Gitlab. Saving the images in a dedicated project.
Enabling VM Manager in business projects to manage the virtual machines:
- OS patch management service to apply on-demand and scheduled patches. We will also use this service for patch compliance reporting.
- OS inventory management service to collect and review operating system information.
- OS configuration management service to install, remove, and auto-update software packages.
Creating a Compute Engine Instance Group Manager using Terraform and Gitlab. The instance template will use the custom image.

Defining the image family

Images provide the base operating environment for applications that run in Compute Engine, and they are critical to ensuring your application deploys and scales quickly and reliably. You can also use images to archive application versions for disaster recovery or rollback scenarios. [2]

An image family is a set of images that are preconfigured for a specific purpose or using a specific architecture in Compute Engine.

For example, if you have a MongoDB deployed in Compute Engine, you need to create a custom image and flag it with the family MongoDB. When you will create a new GCE instance, the latest image of the family will be used to create the instance.

When you build a new custom image, add the flag family only if the custom image is validated because it may introduce incompatibility with your application which can cause issues in a production environment. [3]

In our case, we will deploy a simple Web application named webapp and attach it to the family linux-debian-nodejs.

As you will have understood, the family of images corresponds to a JavaScript Web application deployed under Debian distribution.

In some organizations, they use custom images to provide hardened images and leave the responsibility of adding the necessary OS packages to their business application up to the user.

Custom images

While configuring an instance's startup script or using config management tools like Ansible, Chef or puppet is a viable way to provision your infrastructure, a more efficient method is to create a new custom image with your configuration incorporated into the public image. You can customize images in several ways. In this part we focus on the Automated way.

Let's create our Packer template to baking our custom image.

Packer is an open source tool for making image creation more reproducible, auditable, configurable, and reliable.

Image builder

This part is inspired from the Image builder overview section of the GCP tutorial Automated image builds with Jenkins, Packer, and Kubernetes

The following diagram shows how various components interact to create a system that automatically builds VM images. In this case, we build immutable images.

Immutable image has all of its software included on the image. When an instance is launched from the image, there are no packages to download or software to install.

You define a pipeline in Gitlab CI for each image you want to build. The pipeline polls a source code repository, Git in this illustration, that contains configuration scripts and a Packer template describing how to build an image. When the polling process detects a change, the pipeline assigns the job to a Gitlab runner worker. The runner uses Packer to run the build, which outputs a VM image to Compute Engine.

Packer and configuration scripts

Each image should have its own repository with a Packer template and config scripts.

We use Ansible to customize Debian 9 by adding Node.js.

Image naming and packer variables

The Gitlab pipeline builds an image any time a change is made to the Gitlab repository containing the image’s Packer template and config scripts. It's a good idea to name or tag images with the Git branch and commit ID from which they were built. Packer templates allow you to define variables and provide values for them at runtime:

packer/packer.json

{
...
  "variables": {
      "git_commit": "<GIT_COMMIT>",
      "git_branch": "<GIT_BRANCH>",
      "nodejs_version": "<NODEJS_VERSION>",
      "nodejs_repo_version": "<NODEJS_REPO_VERSION>",
      "project_id": "<PROJECT_ID>",
      "source_image": "<SOURCE_IMAGE>",
      "zone": "<ZONE>"
  }
...
}

Programmatic configuration with provisioners

A Packer template defines one or more provisioners that describe how to use Ansible to configure an instance. This snippet defines a Ansible provisioner with playbook paths and recipes to run to configure an image.

{
  ...
  "provisioners":[
    {
      "type": "ansible",
      "playbook_file": "./ansible/playbook.yml",
      "extra_arguments": [
        "--extra-vars",
        "nodejs_version={{user `nodejs_version`}} nodejs_repo_version={{user `nodejs_repo_version`}}"
      ]
    }
  ]
  ...
}

The Ansible playbook and recipes are stored in the same GitLab repository as the Packer template.

packer/ansible/playbook.yml

- name : packer_ansible
  hosts: all
  roles:
    - nodejs

packer/ansible/roles/nodejs/tasks/main.yml

- name: "Install apt-transport-https"
  apt:
    name: apt-transport-https
    state: present
  become: true

- name: "Add nodejs apt key"
  apt_key:
    url: https://deb.nodesource.com/gpgkey/nodesource.gpg.key
    state: present
  become: true

- name: "Add nodejs ppa for apt repo"
  apt_repository:
    repo: deb https://deb.nodesource.com/node_{{nodejs_repo_version}}.x {{ ansible_distribution_release }} main
    update_cache: yes
  become: true

- name: "Install nodejs"
  apt:
    update_cache: yes
    name: nodejs={{nodejs_version}}*
    state: present
  become: true

Defining image outputs with builders

The builders section of the template defines where provisioners will run to create new images:

{
  "variables": {...},
  "provisioners": [...],
  "builders": [
    {
      "type": "googlecompute",
      "project_id": "{{user `project_id`}}",
      "source_image": "{{user `source_image`}}",
      "zone": "{{user `zone`}}",
      "ssh_username": "packer",
      "tags": "packer",
      "use_os_login": true,
      "image_description": "Linux Debian NodeJS",
      "image_name": "linux-debian-nodejs-{{user `git_branch`}}-{{user `git_commit`}}",
      "image_family": "linux-debian-nodejs"
    }
  ],
}

The googlecompute builder includes a project_id attribute that indicates where the resulting image will be stored. The image_name attribute, which assigns a name to the resulting image, concatenates variables to create a name with information about the image: the image name, the Git branch, and the Git commit ID that was used to build the image.
We add a tag to allow SSH access on specific instances and we use OS login to connect to the instance.

Deploying on Gitlab

Now we have our Packer template we can deploy the custom image. The Gitlab runner will need the following permissions:

roles/compute.instanceAdmin,
roles/compute.storageAdmin at secops project.

$ gcloud iam service-accounts create packer \
  --project $PROJECT_ID \
  --description="Packer Service Account" \
  --display-name="Packer Service Account"

$ gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member=serviceAccount:packer-deployer@$PROJECT_ID.iam.gserviceaccount.com \
    --role=roles/compute.instanceAdmin

$ gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member=serviceAccount:packer-deployer@$PROJECT_ID.iam.gserviceaccount.com \
    --role=roles/compute.storageAdmin

Note: To assign permissions to a Gitlab runner, please check out my latest article on Securing Google Service Account from Gitlab CI.

Before running the Gitlab pipeline, ensure to have SSH port opened for the Cloud NAT IP used by the GKE cluster.

gcloud compute firewall-rules create allow-packer-ssh --allow tcp:22 --source-ranges "<NAT_IP>" --target-tags "packer"

Recommendation: If you have a dedicated service accounts for GCE instances, use service accounts as target.

gcloud compute firewall-rules create allow-packer-ssh --allow tcp:22 --source-ranges "<NAT_IP>" --target-service-accounts "gsa_name@{{user `project_id`}}.iam.gserviceaccount.com"

"builders": [
    {
      "service_account_email": "gsa_name@mycompany-secops.iam.gserviceaccount.com"
    }
]

In order to enable OS config agent in the custom image, enable the OS Config API on the secops project.

gcloud services enable osconfig.googleapis.com

Now we can run our Gitlab pipeline.

stages:
  - build

.install:
  before_script:
    - apt-get update
    - apt-get install -y zip unzip
    # Install Ansible
    - apt-get install -y ansible
    # Install Packer
    - curl -sS "https://releases.hashicorp.com/packer/1.7.1/packer_1.7.1_linux_amd64.zip" > packer.zip
    - unzip packer.zip -d /usr/bin

variables:
  ZONE: "europe-west1-b"
  NODEJS_VERSION: "15.13.0"
  NODEJS_REPO_VERSION: "15"
  PUBLIC_IMAGE_FAMILY_NAME: "debian-9"
  PUBLIC_IMAGE_FAMILY_PROJECT: "debian-cloud"
  PROJECT_ID: "mycompany-secops"

build custom image:
  extends: .install
  stage: build
  image: 
    name: google/cloud-sdk
  script: 
    - cd packer
    - gcloud config set project $PROJECT_ID
    - SOURCE_IMAGE=$(gcloud compute images describe-from-family $PUBLIC_IMAGE_FAMILY_NAME --project $PUBLIC_IMAGE_FAMILY_PROJECT --format "value(name)")
    - sed -i "s/<GIT_COMMIT>/${CI_COMMIT_SHORT_SHA}/g; s/<GIT_BRANCH>/${CI_COMMIT_BRANCH}/g; s/<NODEJS_VERSION>/$NODEJS_VERSION/g; s/<NODEJS_REPO_VERSION>/$NODEJS_REPO_VERSION/g; s/<SOURCE_IMAGE>/$SOURCE_IMAGE/g; s/<ZONE>/$ZONE/g; s/<PROJECT_ID>/$PROJECT_ID/g;" packer.json
    - packer build packer.json
  only:
    - develop 
  tags:

    - packer build packer.json
  only:
    - develop 
  tags:
    - k8s-image-runner

The following output is printed by Gitlab CI

googlecompute: output will be in this color.
==> googlecompute: Checking image does not exist...
==> googlecompute: Creating temporary rsa SSH key for instance...
==> googlecompute: Importing SSH public key for OSLogin...
==> googlecompute: Obtaining SSH Username for OSLogin...
==> googlecompute: Using image: debian-9-stretch-v20210316
==> googlecompute: Creating instance...
    googlecompute: Loading zone: europe-west1-b
==> googlecompute: Creating instance...
    googlecompute: Loading zone: europe-west1-b
    googlecompute: Loading machine type: n1-standard-1
    googlecompute: Requesting instance creation...
    googlecompute: Waiting for creation operation to complete...
    googlecompute: Instance has been created!
==> googlecompute: Waiting for the instance to become running...
    googlecompute: IP: <CLOUD_NAT_IP>
==> googlecompute: Using ssh communicator to connect: <CLOUD_NAT_IP>
==> googlecompute: Waiting for SSH to become available...
==> googlecompute: Connected to SSH!
==> googlecompute: Provisioning with Ansible...
    googlecompute: Setting up proxy adapter for Ansible....
==> googlecompute: Executing Ansible: ansible-playbook -e packer_build_name="googlecompute" -e packer_builder_type=googlecompute --ssh-extra-args '-o IdentitiesOnly=yes' --extra-vars nodejs_version=15.11.0,nodejs_repo_version=15 -e ansible_ssh_
private_key_file=/tmp/ansible-key/[..] -i /tmp/[..]/packer/ansible/playbook.yml
   googlecompute:
    googlecompute: PLAY [packer_ansible] **********************************************************
    googlecompute:
    googlecompute: TASK [Gathering Facts] *********************************************************
    googlecompute: ok: [default]
    googlecompute:
    googlecompute: TASK [nodejs : Install apt-transport-https] ************************************
    googlecompute: changed: [default]
    googlecompute:
    googlecompute: TASK [nodejs : Add nodejs apt key] *********************************************
    googlecompute: changed: [default]
    googlecompute:
    googlecompute: TASK [nodejs : Add nodejs ppa for apt repo] ************************************
    googlecompute: changed: [default]
    googlecompute:
    googlecompute: TASK [nodejs : Install nodejs] *************************************************
    googlecompute: changed: [default]
    googlecompute:
    googlecompute: PLAY RECAP *********************************************************************
    googlecompute: default                    : ok=5    changed=4    unreachable=0    failed=0
==> googlecompute: Deleting instance...
    googlecompute: Instance has been deleted!
==> googlecompute: Creating image...
==> googlecompute: Deleting disk...
    googlecompute: Disk has been deleted!
==> googlecompute: Deleting SSH public key for OSLogin...
    googlecompute: SSH public key for OSLogin has been deleted!
Build 'googlecompute' finished after 1 minute 25 seconds.
==> Wait completed after 1 minute 25 seconds
==> Builds finished. The artifacts of successful builds are:
--> googlecompute: A disk image was created: linux-debian-nodejs-master-g0c85fbc

Enabling VM Manager

To manage operating systems for large virtual machine (VM) fleets we can use the new feature of GCP VM Manager.

VM Manager helps drive efficiency through automation and reduces the operational burden of maintaining these VM fleets. [5]

To set up VM Manager, enable the OS Config API on the business project.

gcloud services enable osconfig.googleapis.com --project mycompany-biz-webapp-dev

In the next section we will enable the OS Config agent by setting instance metadata.

Creating a GCE Instance Group Manager

In this section:

We create a GCE instance template.
We add instance metadata to enable OS Config agent.
We create a GCE instance group manager based on the instance template.
We create an external HTTP load balancer.
In the secops project we authorize the business project to use our custom image.
We finally run the terraform using Gitlab CI

plan/network.tf

resource "google_compute_network" "webapp" {
  name = "webapp-vpc"
  auto_create_subnetworks = "false" 
  routing_mode = "GLOBAL"
}

resource "google_compute_subnetwork" "private-webapp" {
  name = "webapp-subnet"
  ip_cidr_range = "10.10.1.0/24"
  network = google_compute_network.webapp.name
  region = var.region
}

resource "google_compute_address" "webapp" {
  name    = "webapp-nat-ip"
  project = var.project_id
  region  = var.region
}

resource "google_compute_router" "webapp" {
  name    = "webapp-nat-router"
  network = google_compute_network.webapp.name
}

resource "google_compute_router_nat" "webapp" {
  name                               = "webapp-nat-gateway"
  router                             = google_compute_router.webapp.name
  nat_ip_allocate_option             = "MANUAL_ONLY"
  nat_ips                            = [ google_compute_address.webapp.self_link ]
  source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES" 
  depends_on                         = [ google_compute_address.webapp ]
}

plan/template.tf

resource "google_compute_instance_template" "webapp" {
  name        = "webapp-template"
  description = "This template is used to create web app instances."

  tags = ["web"]

  labels = {
    app = "webapp"
    env = var.env
  }

  instance_description = "Web app based on custom image"
  machine_type         = "e2-medium"
  can_ip_forward       = false

  scheduling {
    automatic_restart   = true
    on_host_maintenance = "MIGRATE"
  }

  disk {
    source_image      = data.google_compute_image.webapp.self_link
    auto_delete       = true
    boot              = true

    resource_policies = [google_compute_resource_policy.daily_backup.id]
  }

  network_interface {
    network = google_compute_network.webapp.name
    subnetwork = google_compute_subnetwork.private-webapp.name
  }

  metadata = {
    app-location            = var.zone
    enable-guest-attributes = "TRUE"
    enable-osconfig         = "TRUE"
    startup-script-url      = "${google_storage_bucket.webapp.url}/startup-script.sh"
  }

  service_account {
    email  = google_service_account.service_account.email
    scopes = ["cloud-platform"]
  }

  lifecycle {
    create_before_destroy = true
  }

  depends_on = [
    google_storage_bucket.webapp,
    google_storage_bucket_object.app,
    google_storage_bucket_object.package,
    google_storage_bucket_object.startup
  ]
}

data "google_compute_image" "webapp" {
  name    = var.webapp_custom_image
  project = var.secops_project_id
}

resource "google_compute_resource_policy" "daily_backup" {
  name   = "every-day-4am"
  region = var.region
  snapshot_schedule_policy {
    schedule {
      daily_schedule {
        days_in_cycle = 1
        start_time    = "04:00"
      }
    }
  }
}

resource "google_storage_bucket" "webapp" {
  name          = "${var.project_id}-webapp"
  location      = "EU"
  force_destroy = true

  uniform_bucket_level_access = true
}

resource "google_storage_bucket_object" "app" {
  name   = "app.js"
  source = "script/app.js"
  bucket = google_storage_bucket.webapp.name
}

resource "google_storage_bucket_object" "package" {
  name   = "package.json"
  source = "script/package.json"
  bucket = google_storage_bucket.webapp.name
}

resource "google_storage_bucket_object" "startup" {
  name   = "startup-script.sh"
  source = "script/startup-script.sh"
  bucket = google_storage_bucket.webapp.name
}

Get app.js and package.json from GoogleCloudPlatform/nodejs-getting-started Github repository. Save the files in plan/script.

plan/script/startup-script.sh

#! /bin/bash

# [START startup]
set -v

# Talk to the metadata server to get the project id
PROJECTID=$(curl -s "http://metadata.google.internal/computeMetadata/v1/project/project-id" -H "Metadata-Flavor: Google")
# [END startup]
echo ${PROJECTID}

# Install logging monitor. The monitor will automatically pick up logs sent to
# syslog.
curl -sSO https://dl.google.com/cloudagents/add-logging-agent-repo.sh && \
bash add-logging-agent-repo.sh

# Install dependencies from apt
apt-get update
apt-get install -yq ca-certificates build-essential supervisor

# Get the application source code from Cloud Storage.
# git requires $HOME and it's not set during the startup script.
export HOME=/root
mkdir -p /opt/app/webapp
gsutil cp -r gs://${PROJECTID}-webapp/* /opt/app/webapp 

# Install app dependencies
cd /opt/app/webapp
npm install

# Create a nodeapp user. The application will run as this user.
useradd -m -d /home/nodeapp nodeapp
chown -R nodeapp:nodeapp /opt/app

# Configure supervisor to run the node app.
cat >/etc/supervisor/conf.d/node-app.conf << EOF
[program:nodeapp]
directory=/opt/app/webapp
command=npm start
autostart=true
autorestart=true
user=nodeapp
environment=HOME="/home/nodeapp",USER="nodeapp",NODE_ENV="production"
stdout_logfile=syslog
stderr_logfile=syslog
EOF

supervisorctl reread
supervisorctl update

# Application should now be running under supervisor

plan/instances.tf

resource "google_compute_autoscaler" "webapp" {
  name   = "webapp-autoscaler"
  zone   = var.zone
  target = google_compute_instance_group_manager.webapp.id

  autoscaling_policy {
    max_replicas    = 5
    min_replicas    = 2
    cooldown_period = 90

    load_balancing_utilization {
      target = 0.6
    }
  }
}

resource "google_compute_instance_group_manager" "webapp" {
  name               = "webapp-igm"

  base_instance_name = "webapp"
  zone               = var.zone

  target_size        = 2

  version {
    instance_template = google_compute_instance_template.webapp.id
  }

  named_port {
    name = "http"
    port = "8080"
  }
}

resource "google_compute_health_check" "webapp" {
  name               = "webapp-healthcheck"
  timeout_sec        = 1
  check_interval_sec = 1
  http_health_check {
    port = "8080"
  }

plan/load-balancer.tf

# used to forward traffic to the correct load balancer for HTTP load balancing
resource "google_compute_global_forwarding_rule" "webapp" {
  name       = "webapp-global-forwarding-rule"
  project    = var.project_id
  target     = google_compute_target_http_proxy.webapp.self_link
  port_range = "80"
}

resource "google_compute_target_http_proxy" "webapp" {
  name    = "webapp-proxy"
  project = var.project_id
  url_map = google_compute_url_map.url_map.self_link
}

resource "google_compute_backend_service" "webapp" {
  provider = google-beta

  name          = "webapp-backend-service"
  project       = var.project_id
  port_name     = "http"
  protocol      = "HTTP"
  health_checks = [google_compute_health_check.webapp.self_link]
  backend {
    group                 = google_compute_instance_group_manager.webapp.instance_group
    balancing_mode        = "RATE"
    max_rate_per_instance = 100
  }
}

resource "google_compute_url_map" "url_map" {
  name            = "webapp-load-balancer"
  project         = var.project_id
  default_service = google_compute_backend_service.webapp.self_link
}

The instance must have access to cloud storage to retrieve the source code for the web application.

plan/service-account.tf

resource "google_service_account" "service_account" {
  account_id   = "webapp-user"
  display_name = "Service Account User for NodeJS Webapp"
}

resource "google_project_iam_binding" "storage-object-viewer" {
  project = var.project_id
  role    = "roles/storage.objectViewer"

  members = [
    "serviceAccount:${google_service_account.service_account.email}",
  ]
}

resource "google_project_iam_binding" "logging" {
  project = var.project_id
  role    = "roles/logging.logWriter"

  members = [
    "serviceAccount:${google_service_account.service_account.email}",
  ]
}

resource "google_project_iam_binding" "metric-writer" {
  project = var.project_id
  role    = "roles/monitoring.metricWriter"

  members = [
    "serviceAccount:${google_service_account.service_account.email}",
  ]
}

resource "google_project_iam_binding" "metric-viewer" {
  project = var.project_id
  role    = "roles/monitoring.viewer"

  members = [
    "serviceAccount:${google_service_account.service_account.email}",
  ]
}

plan/firewall.tf

resource "google_compute_firewall" "allow-http" {
  name        = "webapp-fw-allow-http"
  network     = google_compute_network.webapp.name

  allow {
    protocol = "tcp"
    ports    = ["8080"]
  }
  target_tags = ["web"]
}

resource "google_compute_firewall" "allow-https" {
  name        = "webapp-fw-allow-https"
  network     = google_compute_network.webapp.name

  allow {
    protocol = "tcp"
    ports    = ["443"]
  }
  target_tags = ["web"]
}

resource "google_compute_firewall" "allow-lb-health-checks" {
  name        = "webapp-fw-allow-lb-health-checks"
  network     = google_compute_network.webapp.name

  source_ranges = ["35.191.0.0/16", "130.211.0.0/22"]

  allow {
    protocol = "tcp"
  }
  target_tags = ["web"]
}

We use guest policies to maintain consistent software configurations on the virtual machines (VMs).

All instances targeted by the guest policy will be updated each time the agent checks in with the service. This check happens every 10 to 15 minutes.

In this example, we make sure that sshguard andfail2ban are still installed. If a user removes these packages, they will be automatically added by the OS Config agent.

plan/os-config-guest.tf

resource "google_os_config_guest_policies" "webapp" {
  provider        = google-beta
  guest_policy_id = "webapp-guest-policy"

  assignment {
    os_types {
      os_short_name = "debian"
    }
  }

  packages {
    name          = "sshguard"
    desired_state = "INSTALLED"
  }

  packages {
    name          = "fail2ban"
    desired_state = "INSTALLED"
  }

  project = var.project_id
}

plan/variables.tf

variable "zone" {
  type    = string
} 

variable "region" {
  type    = string
} 

variable "webapp_custom_image" {
  type    = string
} 

variable "project_id" {
  type    = string
} 

variable "secops_project_id" {
  type    = string
} 

variable "env" {
  type    = string
}

plan/backend.tf

terraform {
  backend "gcs" {
  }
}

plan/provider.tf

provider "google" {
  project = var.project_id
  region  = var.region
  zone    = var.zone
}
provider "google-beta" {
  project = var.project_id
  region  = var.region
  zone    = var.zone
}

plan/versions.tf

terraform {
  required_version = ">= 0.12"
}

plan/output.tf

output "load-balancer-ip-address" {
  value = google_compute_global_forwarding_rule.webapp.ip_address
}

plan/dev/terraform.tfvars

env                 = "<ENV>"
zone                = "<ZONE>"
region              = "<REGION>"
project_id          = "<PROJECT_ID>"
secops_project_id   = "<SECOPS_PROJECT_ID>"
webapp_custom_image = "<SOURCE_IMAGE>"

The Gitlab runner will need the following permissions:

roles/storage.objectAdmin at secops project to access terrafor bucket,
roles/compute.instanceAdmin,
roles/compute.guestPolicyAdmin,
roles/iam.serviceAccountAdmin at business project.

$ gcloud iam service-accounts create webapp-deployer \
  --project $PROJECT_ID \
  --description="WebApp Service Account Deployer" \
  --display-name="WebApp Service Account"

$ gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member=serviceAccount:webapp-deployer@$PROJECT_ID.iam.gserviceaccount.com \
    --role=roles/osconfig.guestPolicyAdmin

$ gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member=serviceAccount:webapp-deployer@$PROJECT_ID.iam.gserviceaccount.com \
    --role=roles/compute.instanceAdmin

$ gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member=serviceAccount:webapp-deployer@$PROJECT_ID.iam.gserviceaccount.com \
    --role=roles/iam.serviceAccountAdmin

$ gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member=serviceAccount:webapp-deployer@$PROJECT_ID.iam.gserviceaccount.com \
    --role=roles/storage.admin

# In production, you should use "IAM roles for Cloud Storage"
$ gcloud projects add-iam-policy-binding $SECOPS_PROJECT_ID \
    --member=serviceAccount:webapp-deployer@$PROJECT_ID.iam.gserviceaccount.com \
    --role=roles/storage.objectAdmin

In the secops project, we share the image with the business project by giving the service account used by the Gitlab runner the following permission:

SOURCE_IMAGE=$(gcloud compute images describe-from-family "linux-debian-nodejs" --project $SECOPS_PROJECT_ID --format "value(name)")
gcloud compute images add-iam-policy-binding $SOURCE_IMAGE \
    --member 'serviceAccount:webapp-deployer@mycompany-biz-webapp-dev.iam.gserviceaccount.com' \
    --role 'roles/compute.imageUser'
    --project $SECOPS_PROJECT_ID

Now we can run the pipeline with $SOURCE_IMAGE as CI variable.

stages:
  - init
  - deploy

# Install Terraform
.install:
  before_script:
      - apt-get update
      - apt-get install -y zip unzip
      - curl -sS "https://releases.hashicorp.com/terraform/0.14.7/terraform_0.14.7_linux_amd64.zip" > terraform.zip
      - unzip terraform.zip -d /usr/bin

variables:
  ZONE: "europe-west1-b"
  REGION: "europe-west1"
  PROJECT_ID: "mycompany-biz-webapp-dev"
  SECOPS_PROJECT_ID: "mycompany-secops"

init terraform:
  extends: .install
  stage: init
  image: 
    name: google/cloud-sdk
  script: 
    - cd plan
    - gcloud config set project $PROJECT_ID
    - terraform init -backend-config="bucket=bucket-$SECOPS_PROJECT_ID-terraform-backend" -backend-config="prefix=instances/terraform/state"
  artifacts:
    paths:
      - plan/.terraform
  only:
    - develop 
  tags:
    - k8s-biz-dev-runner

deploy terraform:
  extends: .install
  stage: deploy
  image: 
    name: google/cloud-sdk
  script: 
    - cd plan
    - gcloud config set project $PROJECT_ID
    - SOURCE_IMAGE=$(gcloud compute images describe-from-family "linux-debian-nodejs" --project $SECOPS_PROJECT_ID --format "value(name)")
    - sed -i "s/<SOURCE_IMAGE>/$SOURCE_IMAGE/g; s/<ZONE>/$ZONE/g; s/<REGION>/$REGION/g; s/<PROJECT_ID>/$PROJECT_ID/g; s/<SECOPS_PROJECT_ID>/$SECOPS_PROJECT_ID/g; s/<ENV>/dev/g;" dev/terraform.tfvars
    - terraform apply -auto-approve -var-file=dev/terraform.tfvars
  only:
    - develop 
  tags:
    - k8s-biz-dev-runner

Apr  5 11:28:02 webapp-z93d systemd[1]: Starting Fail2Ban Service...
Apr  5 11:28:02 webapp-z93d fail2ban-client[10371]: 2021-04-05 11:28:02,429 fail2ban.server         [10372]: INFO    Starting Fail2ban v0.9.6
Apr  5 11:28:02 webapp-z93d fail2ban-client[10371]: 2021-04-05 11:28:02,429 fail2ban.server         [10372]: INFO    Starting in daemon mode
Apr  5 11:28:02 webapp-z93d systemd[1]: Started Fail2Ban Service.
[..]
Apr  5 11:28:03 webapp-z93d systemd[1]: Starting SSHGuard...
[   60.904096] ip6_tables: (C) 2000-2006 Netfilter Core Team
Apr  5 11:28:03 webapp-z93d kernel: [   60.904096] ip6_tables: (C) 2000-2006 Netfilter Core Team
Apr  5 11:28:03 webapp-z93d systemd[1]: Started SSHGuard.
Apr  5 11:28:03 webapp-z93d sshguard-journalctl[10628]: Chain INPUT (policy ACCEPT)
Apr  5 11:28:03 webapp-z93d sshguard-journalctl[10628]: target     prot opt source               destination
Apr  5 11:28:03 webapp-z93d sshguard-journalctl[10628]: sshguard   all  --  0.0.0.0/0            0.0.0.0/0
Apr  5 11:28:03 webapp-z93d sshguard-journalctl[10628]: f2b-sshd   tcp  --  0.0.0.0/0            0.0.0.0/0            multiport dports 22
[..]

If any patch is needed, we can use OS patch management. This service has two main components:

Patch compliance reporting, which provides insights on the patch status of the VM instances. We can also view recommendations for the VM instances.
Patch deployment, which automates the operating system and software patch update process.

resource "google_os_config_patch_deployment" "webapp" {
  patch_deployment_id = "patch-deploy-webapp"

  instance_filter {
    group_labels {
      labels = {
        env = "dev",
        app = "webapp"
      }
    }
  }

  patch_config {
    apt {
      type = "DIST"
    }
  }

  recurring_schedule {
    time_zone {
      id = "Europe/Brussels"
    }

    time_of_day {
      hours = 0
      minutes = 30
      seconds = 30
      nanos = 20
    }

    monthly {
      month_day = 1
    }
  }
}

In a next section, we will conduct a deep dive to analyze the potential of OS Patch Management and OS Inventory Management.

Conclusion

The source code is available on Gitlab.

We discovered in this article how to create a custom image and use it to create a Web application. We also enabled VM Manager to simplify and reduce the complexity of ensuring compliance, observability, and maintaining the security of large VM fleets.

If you have any questions or feedback, please feel free to leave a comment.

Thanks for reading!