Learning about the Druid Architecture
This post distills the material presented in the paper titled “Druid-A Real-time Analytical Data Store” published in 2014 by F Yang and others.
The paper presents the architecture of Druid and what problem it solves in the world of analytical processing and details how it supports fast aggregations, flexible filters, and low latency data ingestion.
The paper starts with talking about how much Hadoop has grown in time but the pain in dealing with Hadoop performance. Hadoop excels at storing and providing access to large amounts of data, however, it does not make any performance guarantees around how quickly that data can be accessed. Hadoop works well for storing data, it is not optimised for ingesting data and making that data immediately readable.
The system combines a column-oriented storage layout, a distributed, shared-nothing architecture, and an advanced indexing structure to allow for the arbitrary exploration of billion-row tables with sub-second latencies.
Every section of the paper describes bits and pieces of the problem and how to solve it with the help of Druid storage, it also describes what they learned leaving Druid on production.
Druid solves the problem around the ingesting and exploring large quantities of transactional events(log data). The need for Druid was facilitated by the fact that existing open source Relational Database Management Systems (RDBMS) and NoSQL key/value stores were unable to provide a low latency data ingestion and query platform for interactive applications. In addition to the query latency needs, it supports the system to be multi-tenant and highly available. It tries to solve the problem of data exploration, ingestion, and availability span multiple industries.
A Druid cluster consists of different types of nodes and each node type is designed to perform a specific set of things. The different types of nodes are loosely coupled so that cluster could support distributed, shared-nothing architecture so that intra-cluster communication failures have minimal impact on availability.
- Real-Time Nodes
Real-Time Nodes encapsulates the functionality to ingest and query event streams. Events indexed via these nodes are immediately available for querying. The nodes announced their online state and the data they serve in Zookeeper(uses for coordination). Druid behaves as a row store for queries and processes the events in a buffer to a column-oriented storage format. Each persisted index is immutable and nodes load the indexes into off-heap memory for querying.
We refer segments here as the immutable block which contains all the events that have been ingested by a real-time node for some span of time. Also, Deep Storage mostly refers to S3 or HDFS. The ingest, persist, merge, and handoff steps are fluid; there is no data loss during any of the processes.
Real-Time Nodes are a consumer of data and require a corresponding producer to provide the data stream, Kafka(or any other message bus) sits between the producer and the real-time node. Real-time nodes ingest data by reading events from the message bus. The time from event creation to event consumption is ordinarily on the order of hundreds of millisecond. The purpose of message bus acts as a buffer for incoming events and to act as a single endpoint from which multiple real-time nodes can read events.
- Historical Nodes
Historical Nodes encapsulate the functionality to load and serve the immutable blocks of data(segments) created by real-time nodes. Most of the data in Druid is immutable and typically Historical Nodes are the main workers in the cluster. Nodes also follow shared-nothing architecture i.e. the nodes have no knowledge of one another; they simply know how to load, drop and serve immutable segments. They also use Zookeeper of coordination.
Historical Nodes can support read consistency because they only deal with immutable data. Immutable data blocks also enable a parallelization model; can concurrently scan and aggregate immutable blocks without blocking.
- Broker Nodes
Broker Nodes are query routers to historical and real-time nodes. Broker nodes understand the metadata published in Zookeeper about what segments are queryable and where those segments are located. Broker nodes route the incoming queries such that the queries hit the right historical or real-time nodes. Broker nodes also merge partial results from historical and real-time nodes before returning a final consolidated result to the caller. Nodes also contain a cache with an LRU invalidation strategy. The cache can use local heap memory or an external distributed key/value store such as Memcached. Real-time data is never cached. In the time of outages, broker nodes use their last state and forward queries to the real-time nodes and historical nodes.
- Coordinator Nodes
Coordinator Nodes are the in-charge of data management and distribution on Historical Nodes. They tell the historical nodes to load new data, drop outdated data, replicate data and move data to load balance. Coordinator nodes undergo leader-election process that determines a single node that runs the coordinator functionality. The remaining coordinator nodes act as redundant backups.
4. Storage Format
Data tables in Druid(called data sources) are collections of timestamped events and partitioned into a set of segments. Segments represent the fundamental storage unit in Druid and replication and distribution are done at segments. Druid creates additional lookup indices for string columns such that only those rows that pertain to a particular query filter are ever scanned. They also discuss how much storing of columns indices could help in maximising compression. Druid components allow storage engines such as the JVM heap or in memory-mapped structures(default).
5. Query API
Druid has its own query language and accepts queries as POST requests. Broker, historical, and real-time nodes all share the same query API. It also supports filter set. Druid supports many types of aggregations including sums on floating-point and integer types, minimums, maximums, and complex aggregations(cardinality estimation and others). The one main drawback of using Druid that it doesn’t support a Join query. They still haven’t done that yet and said that research is going on to resolve this.
The paper shows many insights(graphs and tables) of running druid in production that they shared in the paper. The results are:
Average query latency: 550 milliseconds, with 90% queries returning in less than 1 second, 95% queries in less than 2 second and 99% in less than 10 seconds.
For the most basic datasets cluster ingested data 800,000 events/second/core. Ingestion totally depends on the complexity of data source.
7. Druid in Production
Druid is often used to explore data and generate reports on data. Users tend to explore short time intervals of recent data.
Concurrent queries could be problematic which they solve by query prioritization to address the issues.
Assumes that many nodes failing at once are not that practical and left the capacity to completely reassign the data from 2 historical nodes.
In the case of Data Center outages, it relies on Deep Storage.
Operational Metrics on nodes are provisioned too and include system level data(CPU usage, available memory, JVM statistics, and disk capacity). The metrics used for performance and stability of the cluster and also for the aspects of data users.
Druid also paired with Stream Processor(Apache Storm) for both real-time and historical data. Storm handles the streaming data processing work, and the columnar storage used for responding to queries.
Segments are the essence of Druid and they are distributed. They can be exactly replicated over multiple data centers. Such setup may be desired if one data center is situated near to users.
They tried to provide every essential information about the Druid and if anyone wants to get started with it they could start with this paper. They gave references to many more papers which helps you understanding the OLAP databases, columnar storage vs row storage, distributed and real-time analytical store and much more in a single paper. It’s an essential read for people getting started with Druid.