Cong Li

Posted on Sep 25, 2024

GBase 8a MPP Cluster Performance Tuning: GNode Parameter Optimization

#database

In the previous article, we introduced GCluster parameter optimization in the GBase database. Today, we will discuss GNode parameter optimization in the GBase 8a MPP Cluster. You can refer to the previous article here: GBase 8a MPP Cluster Performance Tuning: GCluster Parameter Optimization.

1 Memory Parameters of GNode

1.1. Heap Parameters (global)

gbase_heap_data: Mainly designed for caching data (DC) and should be allocated the most memory.
gbase_heap_large: Used to manage memory that is infrequently requested or released.
gbase_heap_temp: Allocates temporary, small chunks of memory that are used less frequently.
gbase_memory_pct_target: Sets the available memory proportion, with a default value of 0.8.

Minimum Parameter Values:

gbase_heap_data >= 512MB
gbase_heap_large >= 256MB
gbase_heap_temp >= 256MB

Maximum Parameter Values:

(gbase_heap_data + gbase_heap_large + gbase_heap_temp) <= total memory * gbase_memory_pct_target

Note:

Default: total memory = physical memory
When _gbase_memory_use_swap is set to 1: total memory = physical memory + swap

1.2. Operator Buffers (session level)

Operator buffers are session-level settings. For example, if you set gbase_buffer_result=1G and the concurrency is 30, the total memory usage for gbase_buffer_result during execution will be 1G * 30 = 30G, not accounting for other operator buffers.

Commonly used buffers:

gbase_buffer_distgrby: Stores intermediate results of DISTINCT operations.
gbase_buffer_hgrby: Stores intermediate results of HASH GROUP BY operations.
gbase_buffer_hj: Stores intermediate results of HASH JOIN operations.
gbase_buffer_insert: Stores intermediate results of INSERT VALUES operations.
gbase_buffer_result: Stores materialized intermediate results.
gbase_buffer_rowset: Stores intermediate result sets from JOIN computations.
gbase_buffer_sj: Stores intermediate results of SORT MERGE JOIN. This is used when join conditions involve a >= b or a <= b.
gbase_buffer_sort: Stores intermediate results of SORT operations.

Buffer Setting Guidelines:

Under normal circumstances (non-high concurrency scenarios), operator buffers can be set according to the system memory size:

gbase_buffer_hgrby and gbase_buffer_hj should not exceed 4G.
gbase_buffer_result should not exceed 2G.
gbase_buffer_rowset should not exceed 1G.
Other buffers can be set using system estimates.

In high-concurrency scenarios, operator buffers should not be set too large. System auto-evaluation is usually sufficient, but in cases of extreme concurrency, manual adjustment to smaller buffer sizes may be necessary. The total buffer size multiplied by the concurrency should not exceed gbase_heap_large, and in no case should it exceed the total system memory.

Other Situations to Adjust Operator Buffers:

If a particular SQL operation is slow due to a specific operator (as identified by single-node tracing), you can increase the corresponding buffer. For example, if tracing shows slow performance in a join operation, you can increase the value of gbase_buffer_hj. However, care must be taken not to affect the execution of other SQL queries.

2. GNode Concurrency Control Parameters

2.1. `gbase_parallel_execution`

This parameter controls whether to enable parallel execution:

0: Disabled (default).
1: Enabled.

When to use:

If you find low CPU utilization, enabling parallel execution may help.
Parallel execution works by splitting data into chunks and processing them simultaneously, merging the results at the end. This is ideal for complex SQL operations with low concurrency.

2.2. `gbase_parallel_max_thread_in_pool`

Configures the maximum number of threads in the parallel execution thread pool. By default, this is set to twice the number of CPU cores.

Value range: 0~4096. When deploying a single GNode on a server, it's recommended to set this to 4–8 times the number of CPU cores.

Explanation:

Threads in the pool are created when the database service starts. Threads are borrowed from the pool during SQL execution and returned afterward, which reduces the overhead of frequent thread creation and destruction.

2.3. `gbase_parallel_degree`

Controls the maximum degree of parallelism for each SQL query.

Value range: 0~gbase_parallel_max_thread_in_pool. The maximum value cannot exceed the available threads in the thread pool.

1: No parallelism, i.e., single-thread execution.
0: The default degree of parallelism is half of the thread pool (gbase_parallel_max_thread_in_pool / 2). If the thread pool size is an odd number, the integer part is used.

That's all for today's content. Thank you for reading!

How I Cut 22.3 Seconds Off an API Call with Sentry 🕒

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

DEV Community

GBase 8a MPP Cluster Performance Tuning: GNode Parameter Optimization

1 Memory Parameters of GNode

1.1. Heap Parameters (global)

1.2. Operator Buffers (session level)

2. GNode Concurrency Control Parameters

2.1. `gbase_parallel_execution`

2.2. `gbase_parallel_max_thread_in_pool`

2.3. `gbase_parallel_degree`

How I Cut 22.3 Seconds Off an API Call with Sentry 🕒

Top comments (0)

The Next Generation Developer Platform

Read next

Transaction Safety in Rails: Identifying and Addressing Non-Atomic Interactions

Comparing AWS RDS and Amazon Aurora: Which Managed Database Service is Right for You?

SSIS for Modern Data Integration

02. DBMS என்றால் என்ன? What is a DBMS?

Okay

1 Memory Parameters of GNode

1.1. Heap Parameters (global)

1.2. Operator Buffers (session level)

2. GNode Concurrency Control Parameters

2.1. gbase_parallel_execution

2.2. gbase_parallel_max_thread_in_pool

2.3. gbase_parallel_degree

How I Cut 22.3 Seconds Off an API Call with Sentry 🕒

The Next Generation Developer Platform

Read next

Transaction Safety in Rails: Identifying and Addressing Non-Atomic Interactions

Comparing AWS RDS and Amazon Aurora: Which Managed Database Service is Right for You?

SSIS for Modern Data Integration

02. DBMS என்றால் என்ன? What is a DBMS?

Okay

2.1. `gbase_parallel_execution`

2.2. `gbase_parallel_max_thread_in_pool`

2.3. `gbase_parallel_degree`