Sabiha Ali

Posted on Jan 28, 2023 • Edited on Jan 30, 2023

Columnar database and Row database — how are they different?

#aws #database #analytics #olap

We want to make our databases fast so that the queries which we run should return the results fast.

By changing how we store the data in our database we can make the database respond to the queries and optimize our database to either serve the analytical workload or the transactional workload which we call OLAP and OLTP.

Computers in general, even the databases, read the data from disks, hard disks. We are not talking about the RAM data which gets deleted once the computer is shut down but we are talking about the hard disks where the data is stored permanently.

On these storage disks data is stored in the form of blocks. it is evident that if the computer has to read fewer blocks, it is going to take less time and if the computer is going to read data from more blocks, it is going to take more time.

If the data which we query are on fewer blocks, then the response of the database would be faster.

Let us understand this with an example

This is a sample sales table

Let us see how this data will be stored in a row-based data store.

In the row-based format, the entire record is stored in one block meaning as you can see in the figure, the record of the first customer — — -the customer name, item ID, and sales amount is stored in the first block.

Therefore, if we are looking for the first customers details you can find all that in one single block. This type of data stores will be great for transactional databases where we need the details of a single entity.

If I have to take the average of all the sales then I would have scan through all the five blocks of data, resulting in more IO and more cost.

Let us see how this data will be stored in a row-based data store.

Data in the column stores are stored differently. Here entire columns are written down inside one block so here we have all the customer names in one block all the item IDS in the next block and all the sales amount in the third block.

So, instead of having each customer’s data together we have the whole column of data together.

If I want to access the sum or the average of all the sales amount this database would be fast because everything would be saved in one single block.

I will use this type of database for analytical queries like sum or avarages etc

Columnar database is also very suitable for compression because each block will have the same type of data either numbers or strings etc. so it would be very easy to compress the data in the block efficiently.

How would queries work with these different storage methods
Transactional style query
Eg:

SELECT Customer_name , Sales

FROM data

WHERE Item_ID=3000

This is a transactional style query where we need all details of one person.

In the row database, we need to scan the data from just one block whereas in the columnar database we would have to scan through a lot of blocks.

Analytical style query
Eg:

SELECT Item_ID, count(1)

FROM data

GROUP BY Item_ID

Here we want to count how many observations we have from each Item_ID, which could be done by reading a single block in the Columnar database, whereas in the row database we would have to scan through all the blocks

Happy Databasing !!!!

By, Sabiha Ali, Solutions Architect, ScaleCapacity Inc.

DEV Community

Columnar database and Row database — how are they different?

Top comments (0)

Read next

Calling IAM authenticated API Gateway with different HTTP clients

Is Agentic AI Dominating in 2025?

Full Stack Application Hosting in AWS

What is Amazon Rekognition?