OpenTelemetry (OTel) is an open-source standard used in the collection, instrumentation, and export of telemetry data from distributed systems. As a framework widely adopted by SRE teams and security teams, OTel is more than just one nice-to-have tool among many; it is critical.
In this post, we’ll explore the role that OTel plays in system security. We’ll look at how telemetry data is used to secure systems along with how OTel securely handles telemetry data. Then, we’ll consider concrete practices—basic and advanced—that you can adopt as you use OTel in your organization.
Let’s begin by looking at the relationship between system security and telemetry data.
Robust system security relies on the application of many practices, including:
- Defense in depth
- Risk mitigation
- Granular access control
- Early threat detection and response
- Resiliency and business continuity
Designing and implementing solid security requires a deep understanding of your systems coupled with high visibility into both your business systems and your security mechanisms. That visibility comes through the capturing and monitoring of telemetry data.
Telemetry data (logs, metrics, distributed tracing) provides information about the normal behavior of a system. By continuously collecting and analyzing this data, security teams can establish baselines and thresholds that help identify deviations from the norm.
Telemetry data is pivotal in enhancing the security posture of organizations. It not only facilitates incident response and forensic investigations, but also aids in proactive threat hunting and compliance auditing. Its importance becomes even more pronounced when identifying patterns, trends, and possible indicators of compromise (IOCs) in complex systems, thus enabling robust security measures and strategies.
In the quest for tools to help with collecting and normalizing telemetry data, several options have come and gone. However, OTel has emerged as the industry standard for working with telemetry data. It provides a standardized way to capture and transmit telemetry data across various components and services within your systems.
In addition to its role in instrumentation, OTel brings secure practices to its handling of telemetry data.
OTel’s focus on observability and telemetry data collection is enhanced and supported by various security capabilities.
First, OTel ensures the secure transmission of telemetry data across distributed systems. It supports secure communication protocols like HTTPS and gRPC, which use Transport Layer Security (TLS). This ensures that telemetry data is encrypted during transmission, protecting it from unauthorized access or tampering.
OTel can also leverage existing authentication strategies within your system. By integrating OTel with authentication systems like OAuth, JWT, or API keys, you ensure that only authorized entities can access and transmit telemetry data.
As per compliance policies, your systems may need role-based access control (RBAC). OTel instrumentation works within the boundaries of your RBAC policies. By defining fine-grained access control rules, you can specify which authenticated users or services have permission to perform instrumentation actions or access telemetry data through an OTel collector.
OTel contributes significantly to auditing and compliance efforts. By capturing logs as part of the telemetry data, it provides visibility into the actions and behaviors of a distributed system. This can be valuable for detecting security incidents, investigating breaches, and complying with regulatory requirements. This is particularly true if your organization is using highly federated service mesh architecture. The goal of which could be to separate customer data from a payment portal and the application data.
As the industry standard, OTel enjoys seamless integration on the receiving end, with countless systems and components that support OTel out of the box. Similarly, on the exporting end, cloud providers and observability platforms support the ingestion of telemetry data from OTel. Therefore, a major advantage of using OTel is the avoidance of vendor/technology lock-in. If you want to use multiple collection agents (for logs, metrics, security event data, traces) or migrate away from a specific vendor, you can do so without losing all the hard work you put into instrumenting your applications and the processes you created around monitoring.
Finally, the OTel project actively involves the developer community in addressing security-related concerns. Through community contributions, code reviews, and security audits, efforts are made to identify and mitigate potential security vulnerabilities. Regular updates and patches are released to address any security issues discovered, ensuring a more secure framework.
Now that we’ve looked at the what, let’s look at the how. How might you enhance the security of your systems with telemetry data and OTel?
Let’s consider concrete steps that you can take to begin securing your systems with OTel.
Before you can effectively instrument your system for security, you must first identify which parts would be the most beneficial to instrument. Identify your applications and the security components of your system. The security-related components include:
- Intrusion detection systems (IDS)
- Antivirus software
- Authentication mechanisms
Use OTel libraries to instrument the components you’ve identified. Instrumentation allows you to capture relevant telemetry data from these components. Define and collect data from the applications and security components to monitor their health and performance.
Metrics such as CPU usage, memory utilization, network traffic, and event counts can provide insights into your system’s overall health and resource utilization. OTel allows you to define and capture custom metrics specific to your security infrastructure.
Use the distributed tracing capabilities of OTel to trace security events across different components and services. By capturing traces, you can gain visibility into the flow of security-related activities, identify bottlenecks, and analyze the effectiveness of security controls. Traces help you understand the sequence of events during security incidents or breaches, aiding in incident response and forensic investigations.
Configure OTel exporters to transmit the collected telemetry data to backend systems for storage, analysis, and visualization. Choose an appropriate observability platform. You can use open-source solutions such as Grafana, Prometheus, and Elasticsearch. Or, you can use a comprehensive, turnkey, integrated platform like Sumo Logic to receive and process all of the telemetry data (in the form of logs, metrics, traces, and events). These platforms provide dashboards and visualization tools to monitor and analyze your system's real-time health, performance, and security.
Set up alerting mechanisms based on predefined thresholds or anomaly detection algorithms. By leveraging the collected telemetry data, you can configure alerts to notify security teams when certain metrics or events fall outside of normal or expected ranges. This enables proactive monitoring, rapid incident response, and the mitigation of potential security breaches.
By combining distributed traces with application logs, you bring context that can help reconstruct the sequence of events leading up to an incident. Utilize all the information to perform a root cause analysis and better understand the impact of the incident.
Define log filtering and retention policies to ensure you have the relevant data and enough historic context. Balance the amount and duration of logs you keep with the storage cost.
The above list would be considered baseline practices for organizations seeking to use OTel for enhancing system security. If you’re looking to level up your security practices, the following advanced practices may interest you:
Advanced security analysis with OTel can provide valuable insights and capabilities to further enhance security monitoring and incident response. Let’s look at three key opportunities.
OTel allows you to attach custom metadata to telemetry data. You might attach user IDs, transaction IDs, or any other contextual information relevant to security analysis and auditing. By incorporating this metadata, you can enrich the telemetry data with additional details that might aid you in the following ways:
- Identifying the source of security events
- Tracing the actions of specific users or transactions
- Conducting forensic investigations during security incidents.
Telemetry data collected by OTel can play a vital role in identifying deviations from security policies and best practices within your systems. By defining policies and desired security configurations, you can compare the telemetry data against these benchmarks to identify any deviations or non-compliant behaviors.
When telemetry data collected and exported by OTel is used in conjunction with AIOps tools, you can implement proactive security measures such as automatic incident response. By using machine learning algorithms and anomaly detection techniques to analyze system telemetry data, you can identify patterns of unusual behavior and potential security threats. The result is early detection of incidents—or even prediction of potential incidents.
You can couple this early detection with an automatic response or simply collect and consolidate the relevant information, devising an action plan which your human security engineers can approve and apply.
Capturing and monitoring telemetry data is essential if you are to understand your systems and keep them secure. With it, you can detect anomalous system behavior, identify policy deviations, and enable rapid incident response.
OTel is a powerful framework—the industry standard—for collecting and analyzing telemetry data in systems. It provides security capabilities such as secure data transmission, authentication integration, access control, and auditing. With OTel, you can instrument applications and security components to collect metrics and trace security events. As you export that telemetry data to observability platforms, you can visualize your data and set up alerts.
By combining telemetry data with AIOps tools, you unlock capabilities for early incident detection and automatic incident response.
The ability to secure your systems ultimately revolves around your adoption of OTel. Without it, you cut off access to an essential tool that helps you gather the data you need for effective security incident detection and response.