Observability refers to the extent to which you can understand the internal state or condition of a complex system based solely on knowledge of its external outputs. In simpler terms, it's about gaining insights into what's happening inside a system by examining the data it generates, such as logs, metrics, and traces. When a system is highly observable, you can quickly and accurately trace performance issues back to their root causes without additional testing or coding effort. This concept has become increasingly critical as cloud-native environments grow more complex and distributed.
You can use this to
- Monitor your data
- Instrument and alert on your apps with APM, browser and synthetics
- Instrument your infrastructure
- Alert quality management
- Leverage OpenTelemetry data
- Explore quality and efficient data via NRQL
What's the difference between Observability and Monitoring?
-
Observability:
- Definition: Observability refers to the ability to understand a complex system's internal state based on its external outputs. It allows you to gain insights into what's happening inside the system by analyzing data such as logs, metrics, and traces.
-
Observability is:
- Proactive: Observability is proactive; it helps you anticipate problems that might occur.
- Root Cause Analysis: With observability, you can identify the root cause of performance issues without additional testing or coding.
- Three Pillars: The three pillars of observability are logs, metrics, and traces.
- Example: Think of car diagnostic systems β they offer observability for mechanics, enabling them to understand why a car won't start without disassembling it.
- What is the role of Observability: Observability provides actionable insights for addressing problems and allows teams to detect issues proactively.
-
Monitoring:
- Definition: Monitoring involves assessing the health of a system by collecting and analyzing aggregate data based on predefined metrics and logs.
-
Monitoring is:
- Reactive: Monitoring is reactive; it alerts you when something is wrong.
- Known Failures: It helps track known failures and long-term trends.
- Limitations of monitoring: Monitoring relies on predefined metrics; if a problem isn't predicted, it might miss critical production failures.
- Role of monitoring: Monitoring measures the health of an application and helps prevent downtime by alerting on specific conditions.
Observability and monitoring are similar but are not the same. Observability enables understanding, while monitoring focuses on tracking specific metrics.
Data observability is crucial in today's data-driven landscape, and this you can easily do with New Relic:
-
Understanding Data Health:
- Definition: Data observability involves monitoring, managing, and maintaining data to ensure its quality, availability, and reliability across processes, systems, and pipelines.
- Benefits: It provides a deep understanding of data health, allowing teams to identify, troubleshoot, and resolve issues in near-real time.
- Use Cases: Data observability is essential for modern data teams using data for insights, machine learning, and innovation.
-
Challenges with Bad Data:
- Statistics:
- Impact: Bad data can lead to significant consequences, such as stock price drops and lost revenue.
- Detection Difficulty: Unlike application failures, bad data often goes unnoticed until it's too late.
- Defense Mechanism: Data observability acts as a defense, ensuring complete, accurate, and timely data delivery.
-
Components of Data Observability:
- Automated Monitoring: Ensures data pipelines function correctly.
- Triage Alerting: Notifies teams of issues promptly.
- Root Cause Analysis: Helps pinpoint data problems.
- Data Lineage: Tracks data movement.
- SLA Tracking: Ensures data meets service level agreements.
Examples of some New Relic Query Language (NRQL) techniques that can enhance your data analysis and provide valuable insights:
-
Standard Deviation with
stddev()
:- The
stddev()
function measures the variation or dispersion within a set of values. It helps you understand reported values beyond averages. - Example: To compare the standard deviation of transaction response time ("duration") for the last day to the previous day:
SELECT stddev(duration) FROM Transaction SINCE 24 hours ago COMPARE WITH 24 hours ago TIMESERIES
- The
-
Bucketing with
FACET buckets()
:- Use
FACET buckets()
to automatically group data by a specific attribute. It simplifies grouping for any aggregation function. - Example: To find the average duration of transactions based on specific volumes of database calls (e.g., 0-40, 40-80, 80-120 calls), use:
SELECT average(duration) FROM Transaction SINCE 12 hours ago FACET buckets(databaseCallCount, 400, 10)
- Use
-
Advanced Math Functions:
- NRQL offers math functions for smoothing, clamping, and manipulating data. Explore these functions to tailor your analysis.
- Discover Event Types and Attributes:
SHOW EVENT TYPES SHOW ATTRIBUTES
-
Regex Filtering with
RLIKE
:- Use
RLIKE
to filter data based on regular expressions. It's handy for complex pattern matching. - Example: To filter data where the user agent contains "Chrome" or "Firefox":
SELECT * FROM PageView WHERE userAgent RLIKE 'Chrome|Firefox'
- Use
-
Nested Aggregation and Subqueries:
- Combine multiple aggregations or perform subqueries within NRQL to gain deeper insights.
Have fun with these:
Unique users
How many unique user sessions did you have in the last week?
SELECT uniqueCount(session) FROM PageView SINCE 1 week ago
Unique user trends
Were your unique user sessions up or down last week compared to the week before?
SELECT uniqueCount(session) FROM PageView SINCE 1 week ago COMPARE WITH 1 week ago
Pageview trends
How can I graph the number of unique users yesterday compared to the day before?
SELECT count(*) FROM PageView SINCE 1 day ago COMPARE WITH 1 day ago TIMESERIES AUTO
OS version
How many of your mobile users are on the latest OS version?
SELECT uniqueCount(uuid) FROM MobileSession FACET osVersion SINCE 7 days ago
Key account Apdex
What is the Apdex score for a particularly important customer? If you have defined some custom attributes, you can query to monitor how this customer experiences your app from a performance standpoint:
SELECT apdex(duration, t: 0.4) FROM Transaction WHERE customerName='ReallyImportantCustomer' SINCE 1 day ago
Next I'll be writing about OpenTelemtry, and some exciting NRQL to go with it. Meanwhile you can hire me to help you or your organization with your Observability technique and strategy.
Your Full Stack Observability Practitioner!
Top comments (0)