Relational databases may encounter performance issues during operation, causing the system to become unstable. Common troubleshooting steps include identifying the issue, gathering information, pinpointing the cause, developing a solution, implementing the fix, testing the results, and documenting the process. This article focuses on analyzing potential performance issues in GBase 8c primary/standby scenarios, with emphasis on the following cases:
1. Analyzing SQL Statement Execution Status
Issue
Some SQL statements take too long to execute, requiring analysis of the query’s execution state.
Troubleshooting
1) Log into the database with an admin account (gbase):
[gbase@gbase01 ~]$ gsql -d postgres -p 15432
2) Check the currently executing SQL statements:
postgres=# SELECT datname, usename, state, query FROM pg_stat_activity;
If the state
is idle
, it means the connection is idle, waiting for a new command. To only view non-idle queries, use:
postgres=# SELECT datname, usename, state, query FROM pg_stat_activity WHERE state != 'idle';
To check if a query is blocked, use:
postgres=# SELECT datname, usename, state, query FROM pg_stat_activity WHERE waiting = true;
2. Terminating a Specific Session
Issue
In some cases, you may need to forcibly terminate a session to allow the system to continue providing services.
Troubleshooting
1) Log into the database with an admin account (gbase):
[gbase@gbase01 ~]$ gsql -d postgres -p 15432
2) Find the session ID of the problematic session:
postgres=# SELECT datid, pid, state, query FROM pg_stat_activity;
The pid
value represents the thread ID of the session.
3) Terminate the session using the thread ID:
postgres=# SELECT pg_terminate_backend(139834762094352);
3. Finding Long-Running SQL Statements
Issue
SQL statements are running for an extended period or are unable to finish execution.
Cause
This may be due to the complexity of the SQL statement or because the query is being blocked.
Troubleshooting
1) Log into the database with an admin account (gbase):
[gbase@gbase01 ~]$ gsql -d postgres -p 15432
2) Find long-running queries:
SELECT EXTRACT(DAY FROM (current_timestamp - query_start)) * 24 * 60 + EXTRACT(HOUR FROM (current_timestamp - query_start)) * 60 + EXTRACT(MINUTE FROM (current_timestamp - query_start)) AS runtime, datname, usename, query FROM pg_stat_activity WHERE state != 'idle' ORDER BY 1 desc;
This query will return a list of queries sorted by execution time, with the longest-running query at the top.
4. SQL Optimization and Analysis
Optimize SQL by creating appropriate indexes or rewriting the queries. Analyze whether SQL statements are blocked.
Issue
In certain business scenarios, queries may be blocked, causing the execution time to be excessively long.
Cause
When SQL statements attempt to lock resources, they may be blocked if the resource is already locked by another session. The SQL statement must wait for the resource to be released.
Troubleshooting
1) Log into the database with an admin account (gbase):
[gbase@gbase01 ~]$ gsql -d postgres -p 15432
2) Find the thread ID of the blocking session:
SELECT w.query AS waiting_query, w.pid AS w_pid, w.usename AS w_user, l.query AS locking_query, l.pid AS l_pid, l.usename AS l_user, t.schemaname || '.' || t.relname AS tablename FROM pg_stat_activity w JOIN pg_locks l1 ON w.pid = l1.pid AND NOT l1.granted JOIN pg_locks l2 ON l1.relation = l2.relation AND l2.granted JOIN pg_stat_activity l ON l2.pid = l.pid JOIN pg_stat_user_tables t ON l1.relation = t.relid WHERE w.waiting = true;
3) Analyze and terminate the problematic session:
SELECT pg_terminate_backend(139834762094352);
5. Lock Wait Timeout Errors
Issue
During business operations, lock wait timeout errors may occur, such as:
ERROR: Lock wait timeout: thread 140533638080272 waiting for ShareLock on relation 16409 of database 13218 after 1200000.122 ms
ERROR: Lock wait timeout: thread 140533638080272 waiting for AccessExclusiveLock on relation 16409 of database 13218 after 1200000.193 ms
Cause
When SQL attempts to lock a resource and finds it already locked, the statement must wait for the lock to be released. If the wait exceeds 2 minutes, a lock wait timeout error occurs.
Troubleshooting
This issue can be resolved by checking for long-running SQL queries or transactions and terminating problematic sessions, as described in the previous sections.
6. Log Overview
The operating state of the database is influenced by many factors. After an anomaly occurs, the logs are used to determine the cause. Below are the typical locations of log files in GBase 8c primary/standby setups.
(1) GBase Architecture and Composition
A single-node or primary/standby GBase 8c setup stores business data on a single physical node, with high availability achieved through synchronous or asynchronous replication between the primary and standby nodes.
- HA Center: Responsible for high availability and managing failover between nodes.
- GBase 8c Primary/Standby: Manages data storage, query execution, and returning results to clients.
- Storage: Responsible for persisting data.
(2) Viewing Key Service Logs
System logs for the database service, such as:
/var/log/messages
To view active services:
[gbase@gbase01 ~]$ sudo systemctl list-units | grep gbase
To view HA Center logs:
[gbase@gbase01 ~]$ sudo systemctl status server_gbase_gha_server1
[gbase@gbase01 ~]$ sudo journalctl -u server_gbase_gha_server1
To view data node logs:
[gbase@gbase01 ~]$ sudo systemctl status datanode_gbase_dn1_1
[gbase@gbase01 ~]$ sudo journalctl -u datanode_gbase_dn1_1
To view data node operation logs:
[gbase@gbase01 dn1_1]$ cd $GAUSSLOG/pg_log
[gbase@gbase01 dn1_1]$ cd dn1_1
Here you will find logs prefixed with postgresql
, which contain logs about instance operation, login errors, permission issues, and SQL anomalies.
In addition to performance issues, GBase databases may encounter other challenges such as configuration problems, deployment issues, and runtime errors. Thorough examination of error messages, system configuration, and logs is essential for resolving these issues.
Top comments (0)