Cameron Archer for Tinybird

Posted on Mar 22, 2023

The essential components of real-time analytics

#realtime #streaming #database #analytics

This post is the fifth in a series on real-time analytics. It is an excerpt from Real-time analytics, a definitive guide which can be read in full here.

Real-time analytics architectures consist of 3 core components:

Streaming technology
OLAP databases
Publication layers
Real-time analytics captures data through streaming technology, stores it in an OLAP database, and exposes metrics through a low-latency publication layer.

Streaming technology for real-time analytics

Since real-time analytics requires high-frequency ingestion of events data, you’ll need a reliable way to capture streams of data generated by applications and other systems.

The most commonly used technology is Apache Kafka, an open-source distributed event streaming platform used by many. Within the Kafka ecosystem exist many “flavors” of Kafka offered as a service or with alternative client-side libraries. Notable options here include:

Confluent
Redpanda
Upstash
Amazon MSK
Aiven
Self-hosted

While Kafka and its offshoots are broadly favored in this space, a few alternatives have been widely adopted, for example:

Google Pub/Sub
Amazon Kinesis
RabbitMQ
Tinybird Events API

Regardless of which streaming platform you choose, the ability to capture streaming data is fundamental to the real-time analytics stack.

Streaming technology is fundamental to real-time analytics, capturing and transporting data as soon as it's generated.

Real-time analytics databases

Real-time analytics architectures include an OLAP database that can store incoming and historical events data and make it available for low-latency querying.

Real-time databases should offer high throughput on inserts, columnar storage for compression and low-latency reads, and functional integrations with publication layers.

Critically, most standard transactional and document-store databases are not suitable for real-time analytics, so a column-oriented OLAP should be the database of choice.

The following databases have emerged as the most popular open-source real-time analytics databases:

ClickHouse
Druid
Pinot

Real-time databases are built for high-frequency inserts, complex analytics over large amounts of data, and low-latency querying.

Publication layers for real-time analytics

To make use of data that has been stored in real-time databases, developers need a publication layer to expose queries made on that database to external applications. This often takes the form of an ORM or an API framework.

One particular challenge with building real-time analytics architectures is that analytical application databases tend to have less robust ecosystems than their OLTP counterparts, so there are often fewer options to choose from here, and those that exist tend to be less mature and with smaller communities.

So, publication layers for real-time analytics generally require that you build your own custom backend to meet the needs of your application. This means building yet another HTTP API using tools like:

FastAPI (Python)
Express.js (JavaScript)
Hyper (Rust)
Gin (Go)

A real-time analytics publication layer turns databases queries into low-latency APIs to be consumed by user-facing applications.

Each of the 3 core components - streaming technology, OLAP database, and publication layer - matters when building the ideal real-time analytics architecture, and while such an architecture can be constructed piecemeal, beware of technical handoffs that inevitably introduce latency and complexity.

DEV Community

The essential components of real-time analytics

Streaming technology for real-time analytics

Real-time analytics databases

Publication layers for real-time analytics

Top comments (0)

Read next

How to Break Into Data Analytics in 2025: A Guide for Beginners with No Experience

All About Parquet Part 02 - Parquet's Columnar Storage Model

All About Parquet Part 09 - Parquet in Data Lake Architectures

Master Bidirectional One-to-One Relations in 5 Steps: Boost Spring Data JPA Efficiency