What is an index?
In PostgreSQL, an index is a database object that provides a quick lookup mechanism for locating data based on the values in one or more columns. It acts as a separate data structure that allows efficient data retrieval and improves query performance by reducing the amount of disk I/O required to locate specific rows.
There are two types of indexes in PostgreSQL: clustered indexes and non-clustered indexes.
clustered index:
A clustered index in PostgreSQL is a type of index that determines the physical order of data in a table. It reorders the table's rows to match the index, which can greatly increase query performance. The term "clustered" comes from the fact that data with similar index values are stored together, or "clustered," on disk.
Here's a more detailed explanation:
1.Physical Ordering of Data: In PostgreSQL, a table can have numerous indexes, but only one can be a clustered index. This index determines the physical order of data in the table. When a table is clustered, it is physically reordered on disk to match the index. This can greatly increase the speed of queries that use the index, as the database can find the data more quickly.
2.Clustering Factor: The clustering factor is a measure of how closely the order of the data in the table matches the order of the data in the index. A high clustering factor means that the data in the table is well-ordered according to the index, which can improve query performance.
3.Reordering Data: When data is inserted or updated, it may not be stored in the order of the clustered index. Over time, this can degrade the performance of the index. To maintain the performance of a clustered index, the table may need to be re-clustered periodically. This can be done using the CLUSTER
command in PostgreSQL.
4.Choosing a Clustered Index: The choice of clustered index can have a significant impact on the performance of a database. In general, the clustered index should be chosen based on the queries that are most frequently run against the table. For example, if a table is often queried by date, then a date column might be a good choice for a clustered index.
5.Limitations: While a clustered index can greatly improve query performance, it also has some limitations. For example, a table can only have one clustered index. Additionally, reordering the table to match the index can be a time-consuming operation, especially for large tables.
Here's an example of how to create a clustered index in PostgreSQL:
CREATE TABLE example_table (
id serial PRIMARY KEY,
name varchar(100),
created_at timestamp
);
CREATE INDEX example_index ON example_table (created_at) CLUSTER;
In this example, example_table
is clustered on created_at
using example_index
. This means that the rows in example_table
are physically ordered on disk according to the created_at
timestamp.
Now you might be wondering, since the id
column is already a primary key and serves as the clustered index, how can you create an additional index on the created_at
column?
In many database systems, the primary key is automatically used as the clustered index. However, PostgreSQL is different. When you create a table in PostgreSQL, the rows are stored in the order they are inserted, not according to the primary key.
When you create a primary key in PostgreSQL, it automatically creates a unique B-tree index on the column or group of columns listed in the primary key, but this index is not a clustered index by default.
You can choose to make the primary key index a clustered index when you create the table, or you can later cluster the table on the primary key index. But you can also create a separate clustered index on a different column, like created_at
in the example I provided.
If you want to create the primary key as a clustered index, you can do it like this:
CREATE TABLE example_table (
id serial PRIMARY KEY,
name varchar(100),
created_at timestamp
) CLUSTER ON example_table;
Or if you want to later cluster the table on the primary key index, you can do it like this:
CLUSTER example_table USING id;
Remember, a table can only have one clustered index. If you cluster the table on a different index, it will replace the existing clustered index.
Top comments (0)