This post is the fifth in a series on real-time analytics. It is an excerpt from Real-time analytics, a definitive guide which can be read in full here.
--
Real-time analytics architectures consist of 3 core components:
- Streaming technology
- OLAP databases
- Publication layers
Streaming technology for real-time analytics
Since real-time analytics requires high-frequency ingestion of events data, you’ll need a reliable way to capture streams of data generated by applications and other systems.
The most commonly used technology is Apache Kafka, an open-source distributed event streaming platform used by many. Within the Kafka ecosystem exist many “flavors” of Kafka offered as a service or with alternative client-side libraries. Notable options here include:
- Confluent
- Redpanda
- Upstash
- Amazon MSK
- Aiven
- Self-hosted
While Kafka and its offshoots are broadly favored in this space, a few alternatives have been widely adopted, for example:
- Google Pub/Sub
- Amazon Kinesis
- RabbitMQ
- Tinybird Events API
Regardless of which streaming platform you choose, the ability to capture streaming data is fundamental to the real-time analytics stack.
Streaming technology is fundamental to real-time analytics, capturing and transporting data as soon as it's generated.
Real-time analytics databases
Real-time analytics architectures include an OLAP database that can store incoming and historical events data and make it available for low-latency querying.
Real-time databases should offer high throughput on inserts, columnar storage for compression and low-latency reads, and functional integrations with publication layers.
Critically, most standard transactional and document-store databases are not suitable for real-time analytics, so a column-oriented OLAP should be the database of choice.
The following databases have emerged as the most popular open-source real-time analytics databases:
- ClickHouse
- Druid
- Pinot
Real-time databases are built for high-frequency inserts, complex analytics over large amounts of data, and low-latency querying.
Publication layers for real-time analytics
To make use of data that has been stored in real-time databases, developers need a publication layer to expose queries made on that database to external applications. This often takes the form of an ORM or an API framework.
One particular challenge with building real-time analytics architectures is that analytical application databases tend to have less robust ecosystems than their OLTP counterparts, so there are often fewer options to choose from here, and those that exist tend to be less mature and with smaller communities.
So, publication layers for real-time analytics generally require that you build your own custom backend to meet the needs of your application. This means building yet another HTTP API using tools like:
- FastAPI (Python)
- Express.js (JavaScript)
- Hyper (Rust)
- Gin (Go)
A real-time analytics publication layer turns databases queries into low-latency APIs to be consumed by user-facing applications.
Each of the 3 core components - streaming technology, OLAP database, and publication layer - matters when building the ideal real-time analytics architecture, and while such an architecture can be constructed piecemeal, beware of technical handoffs that inevitably introduce latency and complexity.
Top comments (0)