In the previous parts, we saw an introduction to ClickHouse and its features. Furthermore, we learned about its different table engine families and their most usable members. In this part, I will walk through the special keys and indexes in ClickHouse, which can help reduce query latency and database load significantly.
It should be said that these concepts are only applicable to the default table engine family: Merge-Trees.
ClickHouse indexes are based on Sparse Indexing, an alternative to the B-Tree index utilized by traditional DBMSs. In B-tree, every row is indexed, which is suitable for locating and updating a single row, also known as pointy-queries common in OLTP tasks. This comes with the cost of poor performance on high-volume insert speed and high memory and storage consumption. On the contrary, the sparse index splits data into multiple parts, each group by a fixed portion called granules. ClickHouse considers an index for every granule (group of data) instead of every row, and that's where the sparse index term comes from. Having a query filtered on the primary keys, ClickHouse looks for those granules and loads the matched granules in parallel to the memory. That brings a notable performance on range queries common in OLAP tasks. Additionally, as data is stored in columns in multiple files, it can be compressed, resulting in much less storage consumption.
The nature of the spars-index is based on LSM trees allowing you to insert high-volume data per second. All these come with the cost of not being suitable for pointy queries, which is not the purpose of the ClickHouse.
In the below figure, we can see how ClickHouse stores data:
- Data is split into multiple parts (ClickHouse default or user-defined partition key)
- Parts are split in granules which is a logical concept, and ClickHouse doesn't split data into them as the physical. Instead, it can locate the granules via the marks.
Granules' locations (start and end) are defined in the mark files with the
- Index values are stored in the
primary.idxfile, which contains one row per granule.
- Columns are stored as compressed blocks in
.binfiles: One file for every column in the
Wideand a single file for all columns in the
Compactformat. Being Wide or Compact is determined by ClickHouse based on the size of the columns.
Now let's see how ClickHouse finds the matching rows using primary keys:
- ClickHouse finds the matching granule marks utilizing the
primary.idxfile via the binary search.
- Looks into the mark files to find the granules' location in the
- Loads the matching granules from the
binfiles into the memory in parallel and looks for the matching rows in those granules using binary search.
To clarify the flow mentioned above, let's create a table and insert data into it:
CREATE TABLE default.projects ( `project_id` UInt32, `name` String, `created_date` Date ) ENGINE = MergeTree ORDER BY (project_id, created_date)
INSERT INTO projects SELECT * FROM generateRandom('project_id Int32, name String, created_date Date', 10, 10, 1) LIMIT 10000000;
First, if you don't specify primary keys separately, ClickHouse will consider sort keys (in order by) as primary keys. Hence, in this table,
created_date are the primary keys. Every time you insert data into this table, it will sort data first by
project_id and then by
If we look into the data structure stored on the hard drive, we face this:
We have five parts, and one of them is:
all_1_1_0. You can visit this link if you're curious about the naming convention. As you can see, columns are stored in
bin files, and we see mark files named as primary keys along with the
Now let's filter on
project_id, which is the first primary key, and explain its indexes:
As you can see, the system has detected
project_id as a primary key and ruled out 1224 granules out of 1225 using it!
What if we filter on
created_date: the second PK:
EXPLAIN indexes=1 SELECT * FROM projects WHERE created_date=today()
The database has detected
created_date as a primary key, but it hasn't been able to filter any granules. Why?
Because ClickHouse uses binary search only for the first key and generic exclusive search for other keys, which is much less efficient than the former. So how can we make it more efficient?
If we substitute
created_date in the sort keys while defining the table, you will achieve better results in filtering for the non-first keys since the created_date has lower cardinality (uniqueness) than the
CREATE TABLE default.projects ( `project_id` UInt32, `name` String, `created_date` Date ) ENGINE = MergeTree ORDER BY (created_date, project_id)
EXPLAIN indexes=1 SELECT * FROM projects WHERE project_id=700
If we filter on the
project_id, the second key, now ClickHouse, would use only 909 granules instead of the whole data.
So to summarize, always try to order the primary keys from low to high cardinality.
I mentioned earlier that if you don't specify the
PRIMARY KEY option, ClickHouse considers sort keys as the primary keys. However, if you want to set primary keys separately, it should be a subset of the sort keys. As a result, additional keys specified in the sort keys are only utilized for sorting purposes and don't play any role in indexing.
CREATE TABLE default.projects ( `project_id` UInt32, `name` String, `created_date` Date ) ENGINE = MergeTree PRIMARY KEY (created_date, project_id) ORDER BY (created_date, project_id, name)
In this example,
project_id columns are utilized in the sparse index and sorting, and
name column is only used as the last item for sorting.
Use this option if you wish to use a column in the
ORDER BY part of the query since it will eliminate the database sorting effort while running it.
A partition is a logical combination of parts in ClickHouse. It considers all parts under no specific partition by default. To find out more, look into the
system.parts table for that
projects table defined in the previous section:
SELECT name, partition FROM system.parts WHERE table = 'projects';
You can see that the
projects table has no particular partition. However, you can customize it using the
PARTITION BY option:
CREATE TABLE default.projects_partitioned ( `project_id` UInt32, `name` String, `created_date` Date ) ENGINE = MergeTree PARTITION BY toYYYYMM(created_date) PRIMARY KEY (created_date, project_id) ORDER BY (created_date, project_id, name)
In the above table, ClickHouse partitions data based on the month of the
ClickHouse creates a min-max index for the partition key and uses it as the first filter layer in query running. Let's see what happens when we filter data by a column existent in the partition key:
EXPLAIN indexes=1 SELECT * FROM projects_partitioned WHERE created_date='2020-02-01'
You can see that database has chosen one part out of 16 using the min-max index of the partition key.
Partitioning in ClickHouse aims to bring data manipulation capabilities to the table. For instance, you can delete or move parts belonging to partitions older than a year. It is way more efficient than an unpartitioned table since ClickHouse has split data based on the month physically on the storage. Consequently, such operations can be performed easily.
Although Clickhouse creates an additional index for the partition key, it should never be considered a query performance improvement method because it loses the performance battle to define the column in the sort keys. So if you wish to enhance the query performance, contemplate those columns in the sort keys and use a column as the partition key if you have particular plans for data manipulation based on that column.
Finally, don't get partitions in ClickHouse wrong with the same term in the distributed systems where data is split on different nodes. You should use shards and distributed tables if you're inclined to achieve such purposes.
You may have recognized that defining a column in the last items of the sort key cannot be helpful, mainly if you only filter on that column without the sort keys. What should you do in those cases?
Consider a dictionary you want to read. You can find words using the table of contents, sorted by the alphabet. Those items are the sort keys in the table. You can simply find a word starting with W, but how can you find pages containing words related to wars?
You can put marks or sticky notes on those pages making your effort less the next time. That's how Skip Index works. It helps the database filter granules that don't have desired values of some columns by creating additional indexes.
projects table defined in the Order By section.
project_id were defined as primary keys. Now if we filter on the
name column, we'll encounter this:
EXPLAIN indexes=1 SELECT * FROM projects WHERE name='hamed'
The result was expected. Now what if we define a skip index on it?
ALTER TABLE projects ADD INDEX name_index name TYPE bloom_filter GRANULARITY 1;
The above command creates a skip index on the
name column. I've used the bloom filter type because the column was a string. You can find more about the other kinds here.
This command only makes the index for the new data. Wishing to create for already inserted, you can use this:
ALTER TABLE projects MATERIALIZE INDEX name_index;
Let's see the query analysis this time:
As you can see, the skip index greatly affected granules' rule-out and performance.
While the skip index performed efficiently in this example, it can show poor performance in other cases. It depends on the correlation of your specified column and sort keys and settings like index granularity and its type.
In conclusion, understanding and utilizing ClickHouse's primary keys, order keys, partition keys, and skip index is crucial for optimizing query performance and scalability. Choosing appropriate primary keys, order keys, and partitioning strategies can enhance data distribution, improve query execution speed, and prevent overloading. Additionally, leveraging the skip index feature intelligently helps minimize disk I/O and reduce query execution time. By considering these factors in your ClickHouse schema design, you can unlock the full potential of ClickHouse for efficient and performant data solutions.