This is the first part of the ClickHouse Tutorial Series. In this series, I cover some practical and vital aspects of the ClickHouse database, a robust OLAP technology many enterprise companies utilize.
In this part, I'll talk about the main features, weaknesses, installation, and usage of ClickHouse. I'll also refer to some helpful links for those who want to dive into broader details.
ClickHouse is an open-source column-oriented database developed by Yandex. It is designed to provide high performance for analytical queries. ClickHouse uses a SQL-like query language for querying data and supports different data types, including integers, strings, dates, and floats. It offers various features such as clustering, distributed query processing, and fault tolerance. It also supports replication and data sharding. ClickHouse is used by companies such as Yandex, Facebook, and Uber for data analysis, machine learning, and more.
The main features of Clickhouse Database are:
Data in ClickHouse is stored in columns instead of rows, bringing at least two benefits:
- Every column can be sorted in a separate file; hence, stronger compression happens on each column and the whole table.
- In range queries common in analytical processing, the system can access and process data easier since data is sorted in some columns (i.e., columns defined as sort keys). Additionally, it can parallelize processes on multi-cores while loading massive columns.
Note: It should not get mistaken with Wide-Column databases like Cassandra as they store data in rows but enable you to denormalize intensive data in a table with many columns leading to a No-SQL structure.
ClickHouse scales well both vertically and horizontally. It can be scaled by adding extra replicas and extra shards to process queries in a distributed way. ClickHouse supports multi-master asynchronous replication and can be deployed across multiple data centers. All nodes are equal, which allows for avoiding having single points of failure.
To mention some:
- Lack of full-fledged UPDATE/DELETE implementation: ClickHouse is unsuited for modification and mutations. So you'll come across poor performance regarding those kinds of queries.
- OLTP queries like pointy ones would not make you happy since ClickHouse is easily outperformed by traditional RDBMSs like MySQL with those queries.
To name a few:
- Apache Druid
I only cover the Docker approach here. For other methods on different distros, please follow ClicHouse's official Installation.
The docker-compose file:
version: '2' services: clickhouse: container_name: myclickhouse image: clickhouse/clickhouse-server:latest ports: - "8123:8123" - "9000:9000" volumes: - ./clickhouse-data:/var/lib/clickhouse/ restart: unless-stopped
And then run it by:
docker compose up -d
As you can see, two ports have been exposed:
- 8123: HTTP API Port for HTTP requests, used by JDBC, ODBC, and web interfaces.
- 9000: Native Protocol port (ClickHouse TCP protocol). Used by ClickHouse apps and processes like clickhouse-server, clickhouse-client, and native ClickHouse tools. Used for inter-server communication for distributed queries.
To continue the tutorial, we use ClickHouse-Client available on the installed server:
docker exec -it myclickhouse clickhouse-client
Create database test:
CREATE DATABASE test;
CREATE TABLE test.orders (`OrderID` Int64, `CustomerID` Int64, `OrderDate` DateTime, `Comments` String, `Cancelled` Bool) ENGINE = MergeTree PRIMARY KEY (OrderID, OrderDate) ORDER BY (OrderID, OrderDate, CustomerID) SETTINGS index_granularity = 8192;
In the next parts, we'll talk about other configurations like
ORDER BY, etc.
To insert sample data:
INSERT INTO test.orders VALUES (334, 123, '2021-09-15 14:30:00', 'some comment', false);
Just like any other SQL query:
SELECT OrderID, OrderDate FROM test.orders;
In the first part of the ClickHouse Tutorial Series, we discussed the traits, features, and weaknesses of ClickHouse. Then we saw how to set up an instance with minimum configuration, create a database and table, insert data into it, and read from it.