DEV Community

leo
leo

Posted on • Updated on

Database (openGauss) Statistics Introduction

What are statistics?

The CBO optimizer needs to select the way to query data according to the data in the table/index. Due to the large amount of data in the table, it is impossible to count the data amount and data distribution in the table in real time every time the query is executed, so regular analysis is required. Data, save the data distribution of tables and indexes to the data dictionary for the optimizer to use, this is statistical information.

openGauss is an optimal execution plan generated based on cost estimates. The optimizer needs to estimate the number of rows and the cost based on the statistics collected by analyze, so the statistics play a crucial role in the estimate of the optimizer's row number and cost.

Collect global statistics through analyze, including: relpages and reltuples in the pg_class table; stadistinct, stanullfrac, stanumbersN, stavaluesN, histogram_bounds, etc. in the pg_statistic table.

When to collect statistics

⚫ It is recommended to routinely execute the ANALYZE statement on a table or the entire database to update statistics after performing bulk insert/delete operations.

⚫ For intermediate tables generated in batch scripts or stored procedures, it is also necessary to explicitly call ANALYZE after data generation is complete.

⚫ For cases where multiple columns in the table are correlated, and the query has conditions or grouping operations based on these columns at the same time, you can try to collect statistics on multiple columns so that the query optimizer can more accurately estimate the number of rows and generate more Effective execution plan.

How to collect statistics

⚫ The ANALYZE statement can collect statistical information related to the table contents in the database, and the statistical results are stored in the system table PG_STATISTIC. The query optimizer uses these statistics to generate the most efficient execution plan.

⚫ Operation steps

Update statistics for a single table.
ANALYZE tablename;
Update database-wide statistics.
ANALYZE;
Collect multi-column statistics for the col_1 and col_2 columns of the tablename table.
ANALYZE tablename (col_1, col_2);
Add the multi-column statistics declaration of the col_1 and col_2 columns of the tablename table. ANALYZE tablename; ANALYZE; ANALYZE tablename (col_1, col_2); ALTER TABLE tablename ADD STATISTICS (col_1, col_2);
ALTER TABLE tablename ADD STATISTICS (col_1, col_2);

Top comments (0)