Michael Levan

Posted on Nov 29, 2023

Create and Understand Your Platform Engineering Environment

#kubernetes #devops #programming #cloud

Sometimes we don’t like the word “standard” and that’s primarily because there’s a feeling of putting yourself in a box and only being able to do “one thing for the thing”. However, if standards are implemented correctly, they help out a ton.

What’s the biggest issue for engineers right now? It’s not the tech and abilities. It’s the fact that there are SO many tools/vendors available right now and engineers don’t know what to choose.

That’s why a standard makes sense.

In this blog post, you’ll learn about a few different methods of thinking about application stacks and what tools you can use.

💡 There are going to be a lot of various pieces you can add to your Platform Engineering stack. Because of that, I’m going to make other pieces of content that define particular stacks. In this blog post, however, it’ll be more about helping you think about and define a few different tools.

The Standard

When thinking about implementing a standard, the most important thing you should remember is it doesn’t mean you can’t deviate. For example, think about the LAMP stack. The LAMP stack is a production-ready standard that engineers can use as a starting point. However, what you’ll see a lot of engineers do is something like replace MySQL with Postgres.

Replacing MySQL with Postgress isn’t a bad thing. Why? Because LAMP isn’t meant to be a hard rule on the tools/platforms you can use. It’s simply meant to give you a production-ready starting point.

The same will go for Platform Engineering tools. If you have a stack that contains ArgoCD and you want to replace it with Flux, since they’re both doing GitOps, that’s totally fine. It can still be “the stack” with Argo or Flux.

An overall standard, if done correctly, isn’t about telling you what tools you need to use and never allowing the ability to swap them out. It’s about putting together a particular set of tools in a particular category that work in production and giving you the ability to swap them out with tools in the same category.

The State Of Internal Developer Platforms/Portals

They are about 70-80% of the way there, but there isn't ONE that does everything the Platform Interaction layer needs.

A few examples are Port and Backstage.

Port is great for deployments and gives you a button to deploy existing pipelines with, but you can't for example, have a page to see your monitoring and observability data for the platform.

Backstage is great as it's fully open and you can make a plugin, the GUI and the overall feel WHATEVER you want, but the problem is it's 100% dependent on TypeScript. It needs an extendable client for other languages.

The other open-source and closed-source tools out there that are focusing on the same thing have the same problem. The tools are focused on:

A really good GUI
A really good CLI with dependencies on a particular language
An open-source platform without the ability to extend capabilities

We need a tool that'll focus on all of it.

There are two out there right now that are close to this.

CNOE (https://buff.ly/3Q8Vsre)
The BACK stack (Backstage, Argo, Crossplane, Kyverno)

The good part about these open-source tools is you can swap them out for the tools you want to use, which is actually what we need (extendibility). For example, Argo can be swapped out for Flux.

Diving in deeper, what engineers would absolutely love is to incorporate a Platform Capability tool into a GUI or a CLI.

For example, let’s say I used Kratix or Crossplane as my backend to create and manage the logic.

If you wanted to, you could put a CLI in front of Crossplane or Kratix. That way, there’s a simple command to run the Recipes and the developer/engineer doesn’t have to run the YAML or even know it exists.

If you wanted to do the same thing with a GUI, you could. For example, there’s a great Recipe example from Kratix that shows a Jenkins installation. The logic is done with Kratix, it’s stored in Kubernetes, and then the Pipelines can be projected to the GUI for visual visibility.

Underlying Platforms

Now it’s time to get into the technical bits. The first step is to think about the underlying platform. The underlying platform is the engine that gives the ability to use various capabilities (which we’ll get to in the next section).

The underlying platform can be anything from Kubernetes to AWS Elastic Container Service (ECS) to Virtual Machines to bare-metal. As it stands right now, it more or less seems like the underlying platform is going to be Kubernetes for the cloud-native space, but that’s not a hard-coded answer.

Once you figure out the underlying platform that you’d like to use (in this case, Kubernetes), you’ll also need to think about what that underlying platform needs. It’s something that’ll need:

Monitoring
Observability
Policy enforcement
Security

The biggest thing to remember is that the underlying platform is not only where the platform capabilities are running, but it’s an entry point to just about anything that can go wrong from security issues to authentication/authorization issues to the platform going down. Because of that, you have to treat it as you would with any other platform that’s running an application stack.

You must make sure it’s properly secure, monitored, and performing as expected.

Monitoring and Observability

For monitoring, you can think about:

Enterprise solutions
Homegrown solutions

A great homegrown/open-source stack is Grafana, Prometheus, Loki, Tempo.

A great enterprise tool is Datadog.

Policy Enforcement

Two top Policy Enforcement tools you’ll see in the Kubernetes space right now are:

Kyverno
Gatekeeper from OPA

There are also various policy tools available in particular clouds. For example, Azure has Azure Policy.

Security

From a security perspective, it’s a huge topic with various tools in itself. The best takeaway from a high-level perspective when it comes to security is to get one tool or multiple tools that:

Scan your cluster
Scan container images
Scan manifests

and if possible, see if those tools provide any type of automatic remediation.

Platform Capability

Capabilities are everything and anything that developers/engineers want to use on the platform. It could be anything from GitOps tools to CICD tools to container image repositories and analytics. It could be anything.

💡 I had a conversation recently with a Data Engineer who wants to create a platform for Data Engineers to use and access their tools in a more efficient manner. It’s not just for developers. It could be any type of capability.

These capabilities can be exposed to a CLI, API, or GUI depending on what type of Interface you create (more on Interfaces later).

Various Tools Available

Not to sound like a broken record as I’m sure everyone has heard this in some way, shape, or form at this point, but there are a lot of tools available. Not just on the CNCF landscape, but even the vendors that aren’t part of the CNCF.

As you’re building in capabilities to your platform, you’ll need to ensure that it’s actually needed. When you’re building a platform, the developers/engineers using it are your customer/client. As we all know with customers, you need to ensure that you know exactly what they need and decipher their exact needs (because sometimes they might not even know).

It’s your responsibility as a Platform Engineer to figure out exactly what needs to go into the stack. Otherwise, you end up with a million tools and tech debt.

An Example Platform Capability Stack

Here’s an example of a Platform Capability stack.

Let’s say you need automated/repeatable deployments with ArgoCD to deploy Kubernetes resources and CICD.

The stack could be:

GitHub Actions
ArgoCD

Both have API’s, so you can, for example, create a CLI with something like the Cobra library in Go (golang).

You can also create a plugin or use a plugin that already exists in something like Backstage or use a plugin that’s available in Port.

💡 Both Port and Backstage are GUI-based IDP tools.

If you have a GUI, you can see the output for things like the pipelines that run via GitHub Actions. The actual CICD logic is still in GitHub Actions, but it’s exposed to the GUI.

The whole idea of the capabilities, regardless of what they are, is to ensure that they aren’t just what the developers/engineers need to get their work done, but it’s exposed to them in a particular way that makes their lives more efficient and reduces cognitive load.

In the next section, you’ll learn about said interaction and exposure.

Platform Interface/Interaction

The overall interface, or rather, how developers/engineers interact with the platform you’ve created as the Platform Engineer is the most important part for them. The capabilities are important of course, but HOW they’re interacting with the capabilities is crucial.

Platform Interface and Capability Combined

Platform Interface and Capabilities can be separate, but they can also be defined. For example, a Platform Capability might be ArgoCD, but ArgoCD isn’t built into Backstage. You can use it on the Backstage GUI if you implement the ArgoCD Backstage Plugin, but it’s not part of Backstage.

However, you may see certain tools that do contain the Interface and the Capabilities. Some of the tools that give both may contain the CLI for the Interface and the Capabilities underneath the hood to create resources.

Let’s take a look at a few of the tools.

Radius

When using Radius, there’s a concept called Recipes. Recipes are how you define what you want the resource you’re creating to look like. They’re defined in Bicep and yes, you can use it to create AWS resources.

First, install the CLI. You can see a few different supported platforms here: https://docs.radapp.io/installation/

Create an empty directory for the Radius configuration to live in.

mkdir radapp

Next, initialize a new Radius project, which will pull down a demo app and a bit of a skaffold/template.

rad init

You’ll be prompted to set up the Radius app in the new directory that you just created. Say yes.

You should see an output similar to the one below.

Next, create a new Bicep configuration to deploy. In this example, you’ll deploy a Pod to Kubernetes.

Save it as test.bicep.

import radius as radius

@description('The ID of your Radius Environment. Automatically injected by the rad CLI.')
param environment string

@description('The ID of your Radius Application. Automatically injected by the rad CLI.')
param application string

resource frontend 'Applications.Core/containers@2023-10-01-preview' = {
  name: 'frontend'
  properties: {
    application: application
    container: {
      image: 'nginx:latest'
      }
    }
  }

Deploy the test.bicep configuration.

rad deploy test.bicep

Once deployed, you should see an output similar to the below.

Checking in the default-radapp Namespace, you can see your Frontend resource deployed.

Kratix

When defining and creating resources with Kratix, there’s an object called Promises, which is defined in YAML (Promises are the way you create resources like with Recipes in Radius).

For Kratix to work, there are two dependencies:

A Kubernetes cluster
Cert-manager

First, install cert-manager.

kubectl apply --filename https://github.com/cert-manager/cert-manager/releases/download/v1.12.0/cert-manager.yaml

Next, install Kratix using a Kubernetes Manifest.

kubectl apply --filename https://raw.githubusercontent.com/syntasso/kratix/main/distribution/single-cluster/install-all-in-one.yaml

💡 At this time, I don’t believe there is a Helm Chart for installing Kratix.

The last step is to register the server to specify where workloads should run. Think about it like a control plane/data plane concept. Where you install Kratix is the control plane. You can also run workloads on it as the data plane or register other Kubernetes cluster to run said workloads.

kubectl apply --filename https://raw.githubusercontent.com/syntasso/kratix/main/distribution/single-cluster/config-all-in-one.yaml

Once installed, you can start using Promises. You can build your own Providers, Promises, or use the marketplace.

For example, click on the Grafana/Prometheus Promise.

It’ll take you to a GitHub page.

Click on the promise.yaml to see the promise. Copy it into a YAML file or clone the repo.

Run the Promise by using kubectl apply -f and you should see that the promise was installed properly with kubectl get promise.

Crossplane

When creating resources with Crossplane, you use Kubernetes Manifests with objects from the Provider you implement (these Manifests are the way you create resources like Recipes in Radius and Promises in Kratix).

First, add the Helm Chart for Crossplane.

helm repo add crossplane-stable https://charts.crossplane.io/stable

Next, install the Crossplane Helm Chart.

helm install crossplane \
crossplane-stable/crossplane \
--namespace crossplane-system \
--create-namespace

With Crossplane, it uses Providers, which are much like Plugins. It extends the capability. In this case, the Provider is the Azure Provider.

cat <<EOF | kubectl apply -f -
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: upbound-provider-azure
spec:
  package: xpkg.upbound.io/upbound/provider-azure:v0.29.0
EOF

For Crossplane to properly manage and create Azure resources, it’ll need a way to authenticate to Azure. You can do this with Azure AD.

Create an AD SP. Once you create the SP, you’ll see a JSON output on the terminal. Take that output and save it to a file called azure.json.

az ad sp create-for-rbac \
--sdk-auth \
--role Owner \
--scopes /subscriptions/your_sub_id

Next, create a new secret with the contents of the azure.json file.

kubectl create secret \
generic azure-secret \
-n crossplane-system \
--from-file=creds=./azure.json

Create the Provider Config below that updates Crossplane to use the new secret that you just created for authentication/authorization to Azure.


cat <<EOF | kubectl apply -f -
apiVersion: azure.upbound.io/v1beta1
metadata:
  name: default
kind: ProviderConfig
spec:
  credentials:
    source: Secret
    secretRef:
      namespace: crossplane-system
      name: azure-secret
      key: creds
EOF

Last but certainly not least, try creating a resource! The example below creates a Azure vNet.

cat <<EOF | kubectl create -f -
apiVersion: network.azure.upbound.io/v1beta1
kind: VirtualNetwork
metadata:
  name: vnettest
spec:
  forProvider:
    addressSpace:
      - 10.0.0.0/16
    location: "US East"
    resourceGroupName: ""
EOF

Difference Between Crossplane and Kratix

In short, Crossplane is what can be thought of as “Infrastructure-as-Code with Kubernetes”. It’s meant to give you the ability to create a resource with Kubernetes.

Kratix, on the other hand, is all about team enablement. It creates resources for you, but the overall goal is to give you a platform that’s as small as possible with high performance that all teams can depend on.

Pre-Made Stacks

As mentioned in the opening of this post, standardization in some way, shape, or form is crucial to avoid things like tech debt and overall confusion. There’s nothing worse than having several engineer teams all using different tools and platforms. It’s a management nightmare for everyone involved.

Because of the need for a standardization, or at least a solid starting point (much like the LAMP stack), Platform Engineers need to start thinking about what a good starting point looks like. Something that isn’t hard-coded. For example, you don’t want to build a stack that relies on ArgoCD. You want it to rely on GitOps, but not a particular GitOps tool. Instead, you want the ability to swap out ArgoCD for Flux as an example.

Currently, there are two stacks in the works:

CNOE
BACK Stack

CNOE

Cloud-Native Operational Excellent, or CNOE for short (pronounced canoe) is a pre-defined stack which contains various tools. The GUI/frontend is Backstage. The GitOps Controller is ArgoCD. It allows you to have a pre-configured environment if you’re looking to use an Internal Developer Platform (IDP) for Interaction/Interface and certain capabilities out of the box.

https://cnoe.io/

CNOE is great, but it’s also very opinionated. As of right now, you must use the tools that are in the platform and cannot swap them out.

BACK Stack

The Backstage, ArgoCD, Crossplane, Kyverno (BACK) stack is less of a “product” like CNOE is and more of a standard that you can follow.

The BACK stack is very-much like a LAMP stack in the sense where it’s a good production-level starting point, but you can swap out resources. For example, Argo can be swapped out for Flux or Kyverno can be swapped out for OPA Gatekeeper.

BACK is a really good idea in terms of engineers having a starting point because currently, there are a lot of tools in the landscape and it can cause a bit of a headache for engineers that are new and experienced in the cloud-native ecosystem.

Closing Thoughts

The purpose of this post was to give you a solid understanding of how you should be thinking about Platform Engineering without the buzz words and marketing hype. The goal is to ensure that you’re confident enough to begin your Platform Engineering journey truly understanding what a true Platform Engineer needs to conduct for their job to be meaningful.

The biggest thing to remember here is that Platform Engineering is still emerging, standards will change, and the tools you use will differ. However, the idea of the Underlying Platform, Platform Capabilities, and Platform Interaction/Interface will remain.

Top comments (2)

Grigor Khachatryan • Dec 12 '23

Thanks. Nice article!

Noah Makau • May 3 '24

Great insights. Thank you.