DEV Community

Aleksi Waldén for Polar Squad

Posted on • Edited on

Prometheus Observability Platform: Platform

What is a platform?

In the modern world of IT, a platform is considered to be a set of shared resources that is utilised by multiple people, such as teams in an organisation. A platform usually consists of a shared codebase that teams can either self-checkout into use, or is automatically included in their workflow.

Earlier there was a buzz about the phrase “you build it, you run it”, which then became a big thing. It meant that each team in the organisation was capable of building their whole application stack including the infrastructure components. They also had the responsibility to maintain said infrastructure. There are benefits to this approach. The biggest one is the fast iteration rate of new ideas and technologies. This is enabled by all the responsibility being inside the team, not tied to a bigger technology stack.

One of the biggest downsides of this approach is that teams become technology and knowledge silos. Every team can have different technologies running and the information is often siloed into that team only. This can lead to a situation where multiple teams are using multiple different technology stacks, and the wheel gets reinvented multiple times, due to a lack of proper knowledge sharing practices. Since the technology stacks can be so different between the teams, getting new team members to replace old ones can also be very difficult.

Nowadays the full stack of an application is very complex. We have to have specialists in application full stack development, infrastructure specialists in the cloud or data centre environments, network specialists to fully understand the networking stack and architecture, and security specialists to keep everything secure. Getting all this expertise into a single team can be very challenging. To partly remedy this issue we can look into platform engineering where we can offload part of the complexity into units specialised in that section.

With platform engineering, we create a platform that each team can build upon. We can, for example, initially have the networking people, the infrastructure people, and the security people collaborate to create a shared codebase that embodies industry best practices and the special requirements of the company. This shared codebase can be for example a self-checkout platform for Kubernetes clusters (AKS, EKS, or GKE) with all bells and whistles included such as cert-manager, external-dns, ingress controller, etc. It can be a single Kubernetes cluster where we split teams with namespaces and provide the teams with ways of deploying their workloads into the cluster. It can be a bigger platform codebase where teams can deploy modules of components that conform to security standards and best practices (e.g. having network integration in private endpoint form, firewalls enabled with preset whitelists for VPN endpoints).

This drastically reduces the cognitive load for teams developing their applications. There are some downsides to this approach due to the increased reliance on the shared codebase. Updates to the code are no longer done by just a single team. You have to communicate the updates and the required steps for performing them. For example, if you upgrade the Kubernetes codebase to update the major version of the cluster, you need to communicate this and create instructions for performing the upgrade and dealing with possibly deprecated resources. You will probably end up versioning your shared codebase (depending on the size of the organisation), as requiring all teams to use the main branch at all times can cause too much overhead for the teams.

The cloud moves very fast and new technologies are constantly being developed. With the platform approach, infrastructure teams can focus on shared practices, and keep evolving those. Security people can focus on creating shared security tooling such as implementing ways for teams to use SonarQube. The networking team can focus on creating a shared networking infrastructure where components can easily be used across teams, using DNS resolution that is usable in multiple clouds and on-premises data centres. The platform team can focus on listening to the needs of the development teams, creating shared code for the most often used components, and helping teams get past hurdles that the cloud imposes (such as migrating from a PostgreSQL version to another).

Next part: Prometheus Observability Platform: Prometheus

Top comments (0)