DEV Community

leo
leo

Posted on

Parallel query technology of openGauss

parallel query
The SMP parallel technology of openGauss is a technology that utilizes the multi-core CPU architecture of a computer to realize multi-thread parallel computing to make full use of CPU resources to improve query performance. In complex query scenarios, the execution of a single query is relatively long and the system concurrency is low. Using SMP parallel execution technology to achieve operator-level parallelism can effectively reduce query execution time and improve query performance and resource utilization. The overall implementation idea of ​​SMP parallel technology is to split the data into parallel query operators, start several worker threads to calculate separately, and finally summarize the results and return them to the front end. SMP parallel execution increases the data interaction operator (Stream) to realize data interaction between multiple worker threads, ensure the correctness of the query, and complete the overall query.

Applicable scenarios and restrictions
The SMP feature improves performance through operator parallelism, and at the same time occupies more system resources, including CPU, memory, I/O, and so on. In essence, SMP is a way of exchanging resources for time. In suitable scenarios and sufficient resources, it can achieve better performance improvement effects; but in inappropriate scenarios or insufficient resources, On the contrary, performance may be deteriorated. The SMP feature is suitable for analytical query scenarios, which are characterized by a long single query time and low business concurrency. The SMP parallel technology can reduce the query delay and improve the system throughput performance. However, in the transactional large concurrent business scenario, since the delay of a single query itself is very short, the use of multi-threaded parallel technology will increase the query delay and reduce the system throughput performance.

Applicable scene

Operators that support parallelism: The following operators support parallelism in the plan.

Scan: supports row-stored ordinary table and row-stored partitioned table sequential scan, column-stored ordinary table and column-stored partitioned table sequential scan.
Join: HashJoin, NestLoop
Agg: HashAgg, SortAgg, PlainAgg, WindowAgg (only partition by is supported, order by is not supported).
Stream: Local Redistribute, Local Broadcast
Others: Result, Subqueryscan, Unique, Material, Setop, Append, VectoRow
SMP-specific operators: In order to achieve parallelism, a new stream operator for data exchange between parallel threads is added for use by SMP features. These newly added operators can be regarded as subclasses of the Stream operator.

Local Gather: implements data aggregation of parallel threads within the instance.
Local Redistribute: Data redistribution is performed according to the distribution key among the threads within the instance.
Local Broadcast: Broadcast data to each thread inside the instance.
Local RoundRobin: Realize data round-robin distribution among threads within the instance.
As an example, take the parallel plan of TPCH Q1 as an example.

In this plan, the parallelization of Scan and HashAgg operators is realized, and the Local Gather data exchange operator is added. Among them, operator No. 3 is the Local Gather operator, and the "dop: 1/4" marked on it indicates that the parallelism degree of the sending end thread of this operator is 4, while the parallelism degree of the receiving end thread is 1, that is, the lower level 4 The No. HashAggregate operator is executed at 4 degrees of parallelism, while the No. 1 and No. 2 operators on the upper layer are executed serially, and the No. 3 operator realizes the data aggregation of the parallel threads in the instance.

The parallel status of each operator can be seen by planning the dop information displayed on the Stream operator.

Non-applicable scene

Index scans do not support parallel execution.
MergeJoin does not support parallel execution.
WindowAgg order by does not support parallel execution.
cursor does not support parallel execution.
Queries within stored procedures and functions do not support parallel execution.
The parallelism of subquery subplan and initplan, and the parallelism of operators containing subqueries are not supported.
Queries with median operations in query statements do not support parallel execution.
Queries with global temporary tables do not support parallel execution.
Updates to materialized views do not support parallel execution.
Resource Impact on SMP Performance
The SMP architecture is a scheme that uses surplus resources in exchange for time. After planning parallelism, resource consumption will inevitably increase, including resource consumption such as CPU, memory, and I/O. Larger, resource consumption also increases. When the above resources become a bottleneck, SMP cannot improve performance, but may cause the overall performance of the database instance to deteriorate. The impact of various resources on the performance of the SMP will be described separately below.

CPU resources

In general customer scenarios, when the system CPU utilization rate is not high, using the SMP parallel architecture can make full use of system CPU resources and improve system performance. However, when the number of CPU cores of the database server is small and the CPU utilization rate is already relatively high, if SMP parallelism is enabled, not only the performance improvement will not be obvious, but the performance may deteriorate due to resource competition among multiple threads.

memory resources

Parallel query will increase memory usage, but the upper limit of memory used by each operator is still limited by parameters such as work_mem. Assuming that work_mem is 4GB and the degree of parallelism is 2, the upper limit of memory allocated to each parallel thread is 2GB. When the work_mem is small or the system memory is not sufficient, after using SMP parallelism, the data may be downloaded to the disk, resulting in the problem of degraded query performance.

I/O resources

To achieve parallel scanning will definitely increase the resource consumption of I/O, so only in the case of sufficient I/O resources, parallel scanning can improve scanning performance.

Effects of Other Factors on SMP Performance
In addition to resource factors, there are other factors that also affect SMP parallel performance. For example, factors such as uneven partition data in the partition table, and system concurrency.

Impact of Data Skew on SMP Performance

Parallelism is less effective when there is severe data skew in the data. For example, the data volume of a certain value in the join column of a table is much larger than that of other values. After parallelism is enabled, hash redistribution is performed on the table data according to the value of the join column, so that the data volume of a certain parallel thread is much larger than that of other threads, resulting in The long tail problem leads to poor results after parallelism.

Influence of system concurrency on SMP performance

The SMP feature will increase the use of resources, and there are fewer resources left in high-concurrency scenarios. Therefore, if SMP parallelism is enabled in a high-concurrency scenario, it will cause serious resource competition among queries. Once resource competition occurs, whether it is CPU, I/O, or memory, it will lead to a decline in overall performance. Therefore, in high-concurrency scenarios, enabling SMP often cannot achieve the effect of performance improvement, and may even cause performance degradation.

configuration steps
Observe the current system load, if the system resources are sufficient (resource utilization is less than 50%), execute 2; otherwise, exit.

Set query_dop=1 (default value), use explain to display the execution plan, and observe whether the plan meets the applicable scenarios in the applicable scenarios and restrictions. If so, go to 3.

Set query_dop=value, regardless of resource conditions and plan features, force dop to be 1 or value.

Set the appropriate query_dop value before executing the qualified query statement, and close query_dop after the statement execution ends. Examples are as follows.

openGauss=# SET query_dop = 4;
openGauss=# SELECT COUNT(*) FROM t1 GROUP BY a;
......
openGauss=# SET query_dop = 1;
illustrate:

In the case of resource permitting, the higher the degree of parallelism, the better the performance improvement effect.
SMP parallelism supports session-level settings. It is recommended that customers enable smp before executing queries that meet the requirements, and turn off smp after execution. In order to avoid impact on the business during business peaks.

Top comments (0)