Image downloaded from ClickHouse official site

ClickHouse (Part I)

Gaurav🇮🇳
2 min readSep 14, 2023

--

ClickHouse, a powerful OLAP database management system designed to run high-speed aggregate queries on hundreds of billions of rows of data. It was created by Yandex, a Russian search engine company, and has grown in popularity over the years due to its speed, scalability, and flexibility.

It comes with its own SQL dialect and compares favourably to PostgreSQL in terms of expressivity and simplicity.

ClickHouse Features

  1. Columnar storage: ClickHouse stores data in a column-oriented manner, which means that it stores all the values of a particular column together. This approach allows for faster data retrieval and processing, especially for large datasets.
  2. Distributed architecture: ClickHouse is built to scale horizontally, which means it can be deployed across multiple servers to handle large amounts of data. It also allows for replication and sharding, which ensures high availability and fault tolerance.
  3. Performance: ClickHouse is built to provide quick query response times, even for large datasets. It accomplishes this through the use of a variety of techniques such as vectorized query execution, data compression, and efficient memory management.
  4. Flexibility: ClickHouse is capable of handling both structured and unstructured data, making it suitable for a wide range of applications. It also supports CSV, JSON, and Parquet data formats.

When should you use ClickHouse?

ClickHouse is an Open-source and fast OLAP database. ClickHouse will work best for you if:

  1. You deal with time-series data such as financial data, sensor data, and log data.
  2. You have tables with an enormous number of columns.
  3. You deal with massive amounts of data (measured in terabytes) that are constantly written and read.
  4. The vast majority of operations consist of reads combined with aggregations.

ClickHouse Limitations

ClickHouse excels at analytic processing, but there are a few areas where it falls short. Here are a couple of examples:

  1. Transaction processing — ClickHouse does not have full ACID transactions. You would not want to use ClickHouse to process online orders. MySQL does this very well.
  2. Rapid retrieving single rows by their keys — The sparse index makes ClickHouse not so efficient for point queries retrieving single rows by their keys
  3. Rapid updates on single rows — Selecting all columns of a single row is inefficient in ClickHouse, as you must read many files. Updating a single row may require rewriting large amounts of data. You would not want to put eCommerce session data in ClickHouse. It is a standard use case for MySQL.

If you liked this article and want to read more about ClickHouse, don’t forget to follow me.

--

--

Gaurav🇮🇳
Gaurav🇮🇳

Written by Gaurav🇮🇳

Data Engineer | Making Data Easier To Use | AdTech | HealthTech

No responses yet