This post is the first in a series on real-time analytics. It is an excerpt from Real-time analytics, a definitive guide which can be read in full here.
Real-time analytics is the process of analyzing data as it is generated. With real-time analytics, people and software can make immediate decisions based on up-to-date information.
Data is most valuable when it’s fresh, and real-time analytics maximizes data freshness when generating insight.
Real-time analytics demands a shift from traditional ways of approaching business analytics. In the past decade, data warehouses were brilliant for building long-running analytics that power business intelligence reports, and the data engineers that have built and maintained them have become prized members of engineering teams across many industries.
Real-time analytics demands a shift away from traditional ways of thinking about data processing.
In more recent times, however, the rise of real-time analytics has been influenced by the growing desire to embed the kinds of analytics pipelines that data engineers have created into user-facing applications. Now, data engineers and software engineers must come together to build real-time analytics infrastructure that not only generates insights but infuses them into user experiences.
It is in these use cases where the predominance of data warehouses and batch processing has not kept pace due to their technical limitations.
What’s the difference between real-time analytics and batch processing?
In contrast to real-time analytics, batch processing and batch data analytics are functionally designed to answer queries made repeatedly and on a schedule.
Batch analytics was borne out of the “Big Data” movement and is useful in long-range business decision-making, measuring performance against goals across time horizons like months, quarters, and years. Batch analytics looks at the past to make decisions about the long-term future. It answers questions like “Will we hit our quarterly revenue numbers?” or “What was the product we sold the most of in Mexico last month?”
Likewise, batch analytics has proven tremendously useful alongside the adoption of data science methodologies, as it can be used to train models that need to crunch and re-crunch large amounts of data over time.
Batch processing takes advantage of many technical approaches - most notably data warehousing - that understand that its functional requirement is to inform long-term business decision-making, most often at the executive and management levels.
Real-time analytics, on the other hand, helps with the tangible, day-to-day, hour-to-hour, and minute-to-minute decisions that materially impact how a business operates. Where batch focuses on measuring the past to predict or inform the future, real-time analytics focuses on the present. It answers questions like “Do we need to order or redistribute stock today?” or “Which offer should we show this customer right now?”
Real-time analytics is increasingly used to generate insights that don't only inform humans, but also automate software.
Real-time analytics can inform decisions made by humans (via reports and dashboards), but increasingly it’s used to automate decision-making within applications and services, driving second-to-second course corrections that previously took weeks or months to make. New technical architectures and real-time analytics platforms have arisen to help developers and data teams meet this need.
Is real-time analytics the same as streaming analytics?
Sometimes real-time analytics is confused with streaming analytics. There are several streaming analytics products available today. They work great for some streaming use cases, but they all fall short when handling the high-concurrency, low-latency demands of real-time applications.
That's because they don't leverage a full OLAP database, like ClickHouse for example, that enables arbitrary time spans (vs. window functions), advanced joins for complex use cases, managed materialized views for rollups, and many other real-time analytics requirements.
Streaming analytics answers questions about a particular thing at a particular moment. Questions like “Is this thing A or B?” or “Does this piece of data have A in it?” as data streams through. Streaming data analytics allows you to ask simple questions about a few things very close together in time. It can offer very low latency, but it comes with a catch: it has limited “memory.”
Real-time analytics, in contrast, has a long memory. It focuses on very quickly inserting data - and retaining all historical data - to answer questions about current and historical events.
By retaining historical data, and updating the full record very quickly, real-time analytics lets you ask questions about data that is happening right now compared to data that happened in the past.
For example, consider an online retailer. They want to show a visitor the best possible offer so that they’ll buy something. With real-time analytics, the retailer can compare that visitor’s current browsing behavior during the session with historical browsing behavior and conversion metrics by past visitors within the same cohort.
The result is a personalized offer based on real-time information that boosts conversion rates and increases average order value.
This is the power of real-time analytics: influencing things that are happening right now based on the deep insights available from analyzing historical and current data.
Top comments (0)