DEV Community

Cong Li
Cong Li

Posted on

GBase 8a MPP Cluster Performance Tuning: GNode Parameter Optimization

In the previous article, we introduced GCluster parameter optimization in the GBase database. Today, we will discuss GNode parameter optimization in the GBase 8a MPP Cluster. You can refer to the previous article here: GBase 8a MPP Cluster Performance Tuning: GCluster Parameter Optimization.

1 Memory Parameters of GNode

1.1. Heap Parameters (global)

  • gbase_heap_data: Mainly designed for caching data (DC) and should be allocated the most memory.
  • gbase_heap_large: Used to manage memory that is infrequently requested or released.
  • gbase_heap_temp: Allocates temporary, small chunks of memory that are used less frequently.
  • gbase_memory_pct_target: Sets the available memory proportion, with a default value of 0.8.

Minimum Parameter Values:

  • gbase_heap_data >= 512MB
  • gbase_heap_large >= 256MB
  • gbase_heap_temp >= 256MB

Maximum Parameter Values:

(gbase_heap_data + gbase_heap_large + gbase_heap_temp) <= total memory * gbase_memory_pct_target
Enter fullscreen mode Exit fullscreen mode

Note:

  • Default: total memory = physical memory
  • When _gbase_memory_use_swap is set to 1: total memory = physical memory + swap

1.2. Operator Buffers (session level)

Operator buffers are session-level settings. For example, if you set gbase_buffer_result=1G and the concurrency is 30, the total memory usage for gbase_buffer_result during execution will be 1G * 30 = 30G, not accounting for other operator buffers.

Commonly used buffers:

  • gbase_buffer_distgrby: Stores intermediate results of DISTINCT operations.
  • gbase_buffer_hgrby: Stores intermediate results of HASH GROUP BY operations.
  • gbase_buffer_hj: Stores intermediate results of HASH JOIN operations.
  • gbase_buffer_insert: Stores intermediate results of INSERT VALUES operations.
  • gbase_buffer_result: Stores materialized intermediate results.
  • gbase_buffer_rowset: Stores intermediate result sets from JOIN computations.
  • gbase_buffer_sj: Stores intermediate results of SORT MERGE JOIN. This is used when join conditions involve a >= b or a <= b.
  • gbase_buffer_sort: Stores intermediate results of SORT operations.

Buffer Setting Guidelines:

Under normal circumstances (non-high concurrency scenarios), operator buffers can be set according to the system memory size:

  • gbase_buffer_hgrby and gbase_buffer_hj should not exceed 4G.
  • gbase_buffer_result should not exceed 2G.
  • gbase_buffer_rowset should not exceed 1G.
  • Other buffers can be set using system estimates.

In high-concurrency scenarios, operator buffers should not be set too large. System auto-evaluation is usually sufficient, but in cases of extreme concurrency, manual adjustment to smaller buffer sizes may be necessary. The total buffer size multiplied by the concurrency should not exceed gbase_heap_large, and in no case should it exceed the total system memory.

Other Situations to Adjust Operator Buffers:

If a particular SQL operation is slow due to a specific operator (as identified by single-node tracing), you can increase the corresponding buffer. For example, if tracing shows slow performance in a join operation, you can increase the value of gbase_buffer_hj. However, care must be taken not to affect the execution of other SQL queries.

2. GNode Concurrency Control Parameters

2.1. gbase_parallel_execution

This parameter controls whether to enable parallel execution:

  • 0: Disabled (default).
  • 1: Enabled.

When to use:

  • If you find low CPU utilization, enabling parallel execution may help.
  • Parallel execution works by splitting data into chunks and processing them simultaneously, merging the results at the end. This is ideal for complex SQL operations with low concurrency.

2.2. gbase_parallel_max_thread_in_pool

Configures the maximum number of threads in the parallel execution thread pool. By default, this is set to twice the number of CPU cores.

Value range: 0~4096. When deploying a single GNode on a server, it's recommended to set this to 4–8 times the number of CPU cores.

Explanation:

  • Threads in the pool are created when the database service starts. Threads are borrowed from the pool during SQL execution and returned afterward, which reduces the overhead of frequent thread creation and destruction.

2.3. gbase_parallel_degree

Controls the maximum degree of parallelism for each SQL query.

Value range: 0~gbase_parallel_max_thread_in_pool. The maximum value cannot exceed the available threads in the thread pool.

  • 1: No parallelism, i.e., single-thread execution.
  • 0: The default degree of parallelism is half of the thread pool (gbase_parallel_max_thread_in_pool / 2). If the thread pool size is an odd number, the integer part is used.

That's all for today's content. Thank you for reading!

API Trace View

How I Cut 22.3 Seconds Off an API Call with Sentry 🕒

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more →

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay