Jan Lepsky

Posted on Mar 26, 2024 • Edited on Jun 6, 2024 • Originally published at mogenius.com

Monitoring Service Health in Kubernetes: A Simplified Approach

#cloud #devops #kubernetes #docker

Determining the health of a service is a crucial aspect of maintaining reliability and performance in modern software development, especially when working with container orchestration tools like Kubernetes.

The health of a service is a multifaceted concept that hinges on its ability to meet availability, performance, and correctness standards throughout its lifecycle. This is not a trivial task, given the complex interplay of factors that influence a service's health, from the build pipeline and deployment processes to the ongoing monitoring and management of the service on Kubernetes.

Let’s have a look at some critical areas.

Understanding Service Health: A Comprehensive Overview

Effective service status monitoring is crucial for real-time health insights across build, deployment, and operations, enabling developers to promptly identify and address failures.

1. Continuous integration and building: The Foundation of Service Health

The journey toward a healthy service begins with a robust CI pipeline. Key health indicators here include the speed and success of builds and tests, crucial for minimizing "Time to Feedback" for developers. Real-time feedback mechanisms, such as CI/CD dashboards and direct integration of checks into the pipeline (e.g., code coverage, linting), are essential for early detection and resolution of potential issues.

2. Deployment: Ready for Traffic

Once the build is finished it is key to observe the success of the deployment. Multiple building blocks of Kubernetes can cause failures, e.g. deployment configurations, resources, pod scheduling, etc. Furthermore, deployment practices must ensure that a service is fully prepared to handle traffic before it is exposed to users. Kubernetes Readiness Probes are critical in this phase, verifying that a service is ready to perform. Furthermore, the implementation of automatic rollback mechanisms based on metrics like error rates and latency ensures that any deployment that could degrade service health is promptly reverted.

To keep your service healthy, rigorously monitor CI/CD progress, deploy readiness checks, and proactively manage performance through real-time dashboards for uninterrupted service excellence. (source: grafana-dashboards)

3. Ongoing Service Monitoring: Keeping the Pulse

Once deployed, continuous monitoring of a service's health is crucial. There’s narrow features in Kubernetes like Liveness Probes that help determine the need for restarting instances, or broad solutions like Prometheus and Grafana that offer detailed monitoring and visualization of service health. Centralized log aggregation and distributed tracing tools enable deep analysis and troubleshooting.

Challenges of bringing service status monitoring to life

Each of the presented steps offers valuable tactics for achieving continuous service health, better visibility during the development process, and faster recovery in case of an incident. However, deploying them to a team and making use of the tools in the software development lifecycle is a different story. In Kubernetes environments, a failing pipeline or crashing pods can have multiple causes. Multiple tools are involved in the process and on Kubernetes-level there’s several workloads that can affect service health.

The challenge lies in the aggregation of service status data in order to deliver an actionable source of truth. With self-service in mind, this should be designed for software developers to enable them to independently check the status of their service and investigate each step in the pipeline. In an aggregated view, delivering near real-time data is key, in order to display what is happening at any given moment what is happening with build, deployment, pod, or health check. Additionally, a certain level of abstraction is required to allow fast interpretation of service status as well as reducing onboarding efforts of developers.

The goal: A single pane of glass

The recently emerging trend of Internal Developer Platforms (IDP) offers a solution to introducing self-service capabilities in development teams. Especially for Kubernetes environments, mogenius offers a Kubernetes Operations Platform that embodies the principles of platform engineering by offering a self-service solution dedicated to simplifying cloud-native development. This platform enables software developers to efficiently deploy and manage applications in the cloud, focusing on innovation rather than infrastructure management.

Within its suite of features, mogenius includes actionable service status monitoring, providing users with a transparent view of their service's health across the build, deployment, and operational phases. The service status delivers real-time data in an aggregated way, allowing developers to instantly identify the cause of failures independently.

Each component of the service indicates success or failure with associated logs to individually investigate the pipeline step.

Kubernetes Health Checks and Beyond

While competing solutions require specific Kubernetes onboarding and offer limited capabilities in adjusting and configuring resources, mogenius’s service status system allows for quick identification and resolution of issues, enhancing the developer experience by reducing the time spent on diagnosing problems by offering detailed logs and metrics at each step of the service lifecycle.

The recent build and deployment were successful and three pods are in running state.

Two of three pods are in an error state which can be investigated in the pod logs.

The last deployment failed, resulting in all three pods being unable to reach a running state.

mogenius goes further by introducing a fourth level of checks: Kubernetes Health Checks, enabling a comprehensive health overview up to the application layer, including scenarios where containers are running, but the application is unreachable, e.g. due to external dependencies like a database failure. With pre-configured health checks toggleable by users, mogenius leverages Kubernetes' Startup Probes, Liveness Probes, and Readiness Probes in a user-friendly manner, adhering to best practices while simplifying the user experience.

mogenius’ Advanced Kubernetes Health Monitoring

mogenius’s approach not only reduces the complexity involved in ensuring service health but also aligns with the shift-left ideology, empowering developers to handle more tasks earlier in the software development lifecycle. mogenius's new service status system exemplifies how technology can be leveraged to enhance visibility, reduce error diagnosis time, and improve overall service reliability and performance in a Kubernetes environment.

Consider

The health of a service is a critical aspect of software development, impacting not just the reliability and performance of the applications but also the efficiency of the development process itself. Navigating the complexities of maintaining service health in Kubernetes environments can be challenging. However, with the advent of internal developer platforms like mogenius, the landscape is changing. mogenius offers a compelling solution that simplifies cloud-native development, empowering developers and DevOps professionals to maintain the health of their services more effectively, allowing them to focus on what they do best: building great software.