DEV Community

Anirudh Mehta
Anirudh Mehta

Posted on • Originally published at Medium

Solving the Image Promotion Challenge Across Multi-Environment with ArgoCD

When designing cloud environments, it is often recommended to set up multiple accounts. While this approach offers resource independence, isolation, better security, access, and billing boundaries, it also comes with its own set of issues. One such challenge is efficiently promoting and tracking applications between different environments.

The GitOps approach, along with tools like ArgoCD and Kustomize, simplifies tracking and promotion. However, image promotion is often overlooked. Many enterprises adopt a shared image registry, but it soon becomes bloated with many unused versions.

This article explores a recent journey during which we examined the problem of promoting images and the innovative solution that was adopted, all while adhering to the principles of GitOps.

Challenge

Recently, a scenario was presented where a company utilizing the shared ECR registry was considering migrating to separate ECR registries for cost-effectiveness, better governance, and streamlined lifecycle management.

Here is a look at the existing state of infrastructure and pipelines:

Source: Image by the author.

  • Each environment has a dedicated AWS account with its own cluster and ArgoCD installation.

  • Kustomize is used for managing configuration differences across environments.

├── infra
  │   ├── charts/
  └── overlays
      ├── dev
      │   ├── patch-image.yaml
      └── production
          ├── patch-image.yaml
          └── patch-replicas.yaml
Enter fullscreen mode Exit fullscreen mode
  • Jenkins is used to continuously build new images in the development environment.

However, none of the tools provided out-of-the-box support for promoting images between ECR registries, leading to the exploration of innovative solutions with some considerations.

Considerations:

  • Selective Promotion: The company’s application landscape is composed of multiple modules and teams with different timelines. Therefore, it is necessary to support the promotion of images for only selected modules in each release.

  • Optimized Storage: Environments such as production only need to store promoted image versions, reducing clutter and optimizing resource usage.

  • Image Tag and Digest Replication: Replicating image tags and digests between ECR registries is critical for security, and traceability.

Potential Solutions

At the outset, two potential solutions were proposed:

  1. ECR Cross Account Replication: AWS’s ECR natively supports replicating images between two accounts. However, as of now, there is no way to filter the images being replicated based on any criteria. Alternatively, AWS recommends event-based design to selectively replicate images based on tag naming conventions. However, since we are not aware of which versions will be promoted, it requires an additional step of retagging before promotion.

  2. Jenkins Promotion Pipeline: A Jenkins pipeline that parses Kustomize Overlays for image tags and programmatical replicates them.

Both options are viable, but they introduce an additional layer of complexity to the promotion process. Additionally, you need to ensure that images are promoted before Kustomize overlays are updated*.*

The Winning Strategy: ArgoCD PreSync Job

In this scenario, the client was already using ArgoCD for continuous deployment of the application changes. Therefore, we decided to also assign ArgoCD the responsibility of delivering images to the target environment cluster.

ArgoCD supports hooks that allow you to run custom scripts before or after a deployment or synchronization process.

Source: Image by the author.

1. ECR Repository Permission: Authorize cross-account pull access for Docker images

To enable ArgoCD to pull images from the source ECR, we need to add a resource-based policy to our repository.

// cross-account-ecr-read-policy.json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowPull",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{DESTINATION_ACCOUNT}:root" // Replace with your destination account
      },
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:BatchGetImage",
        "ecr:GetDownloadUrlForLayer"
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Apply the policy to ECR repositories:

aws ecr set-repository-policy --repository-name example 
--policy-text "file://cross-account-ecr-read-policy.json"

// For multiple repositories:
aws ecr describe-repositories --query "repositories[].[repositoryName]" 
| xargs -I {} aws ecr set-repository-policy --repository-name {} --policy-text "file://cross-account-ecr-read-policy.json"
Enter fullscreen mode Exit fullscreen mode

2. PreSync Hook Job: Copy image between accounts

  • We use Crane to copy images without changing their tag and digest.

  • The PreSync Hook job is stored in git along with other application manifests and monitored by ArgoCD. ArgoCD runs the job before the synchronizing changes.

  • The source account is the Development or DevOps account from which the images will be pulled.

  • The destination account is the Production or target environment where the image needs to be copied.

// Helm template example
apiVersion: batch/v1
kind: Job
metadata:
  generateName: argo-presync-promote-image-
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      volumes:
        - name: creds
          emptyDir: {}
      initContainers:
        - name: aws-creds
          image: public.ecr.aws/aws-cli/aws-cli
          command:
            - sh
            - -c
            - |
              aws ecr get-login-password > /creds/ecr
          volumeMounts:
            - name: creds
              mountPath: /creds
      containers:
        // For brevity, I have assumed that all Helm values are available on the root.
        - name: promote-image
          image: gcr.io/go-containerregistry/crane:debug
          command:
            - sh
            - -c
            - |
              // Login to both ECR registries
              cat /creds/ecr | crane auth login {{.Values.sourceAccount}}.dkr.ecr.us-east-1.amazonaws.com -u AWS --password-stdin
              cat /creds/ecr | crane auth login {{.Values.destinationAccount}}.dkr.ecr.us-east-1.amazonaws.com -u AWS --password-stdin
              // Copy image from source account to destination account
              crane copy {{.Values.image | replace .Values.destinationAccount .Values.sourceAccount}} {{.Values.image}}
          volumeMounts:
            - name: creds
              mountPath: /creds
      restartPolicy: Never
  backoffLimit: 2
Enter fullscreen mode Exit fullscreen mode

Conclusion

In conclusion, the team was able to promote images on demand by using the pre-sync hook. This made production promotion a single step of updating the Kustomize overlays.

I would love to hear about other options that you have adopted. For instance, an alternative approach could be to use Kubernetes Dynamic Admission Control to intercept and pull missing images on demand.

Top comments (0)