DEV Community

Cover image for The essential components of real-time analytics
Cameron Archer for Tinybird

Posted on

The essential components of real-time analytics

This post is the fifth in a series on real-time analytics. It is an excerpt from Real-time analytics, a definitive guide which can be read in full here.

--

Real-time analytics architectures consist of 3 core components:

  • Streaming technology
  • OLAP databases
  • Publication layers
    A diagram showing the three components of real time analytics: streaming technology, OLAP databases, and publication layers
    Real-time analytics captures data through streaming technology, stores it in an OLAP database, and exposes metrics through a low-latency publication layer.

Streaming technology for real-time analytics

Since real-time analytics requires high-frequency ingestion of events data, you’ll need a reliable way to capture streams of data generated by applications and other systems.

The most commonly used technology is Apache Kafka, an open-source distributed event streaming platform used by many. Within the Kafka ecosystem exist many “flavors” of Kafka offered as a service or with alternative client-side libraries. Notable options here include:

  • Confluent
  • Redpanda
  • Upstash
  • Amazon MSK
  • Aiven
  • Self-hosted

While Kafka and its offshoots are broadly favored in this space, a few alternatives have been widely adopted, for example:

Regardless of which streaming platform you choose, the ability to capture streaming data is fundamental to the real-time analytics stack.

Streaming technology is fundamental to real-time analytics, capturing and transporting data as soon as it's generated.

Real-time analytics databases

Real-time analytics architectures include an OLAP database that can store incoming and historical events data and make it available for low-latency querying.

Real-time databases should offer high throughput on inserts, columnar storage for compression and low-latency reads, and functional integrations with publication layers.

Critically, most standard transactional and document-store databases are not suitable for real-time analytics, so a column-oriented OLAP should be the database of choice.

The following databases have emerged as the most popular open-source real-time analytics databases:

  • ClickHouse
  • Druid
  • Pinot

Real-time databases are built for high-frequency inserts, complex analytics over large amounts of data, and low-latency querying.

Publication layers for real-time analytics

To make use of data that has been stored in real-time databases, developers need a publication layer to expose queries made on that database to external applications. This often takes the form of an ORM or an API framework.

One particular challenge with building real-time analytics architectures is that analytical application databases tend to have less robust ecosystems than their OLTP counterparts, so there are often fewer options to choose from here, and those that exist tend to be less mature and with smaller communities.

So, publication layers for real-time analytics generally require that you build your own custom backend to meet the needs of your application. This means building yet another HTTP API using tools like:

  • FastAPI (Python)
  • Express.js (JavaScript)
  • Hyper (Rust)
  • Gin (Go)

A real-time analytics publication layer turns databases queries into low-latency APIs to be consumed by user-facing applications.

Each of the 3 core components - streaming technology, OLAP database, and publication layer - matters when building the ideal real-time analytics architecture, and while such an architecture can be constructed piecemeal, beware of technical handoffs that inevitably introduce latency and complexity.

Top comments (0)