DEV Community

tongxi
tongxi

Posted on

openGauss Routine Maintenance

Routine Maintenance Check Items
Checking openGauss Status
openGauss provides tools to check database and instance status, ensuring that databases and instances are running properly to provide data services.

Check instance status.

""
gs_check -U omm -i CheckClusterState
Check parameters.

""
openGauss=# SHOW parameter_name;
In the above command, parameter_name needs to be replaced with a specific parameter name.

Modify parameters.

""
gs_guc reload -D /gaussdb/data/dbnode -c "paraname=value"
Checking Lock Information
The lock mechanism is an important method to ensure data consistency. Information check helps learn database transactions and database running status.

Query lock information in the database.

""
openGauss=# SELECT * FROM pg_locks;
Query the status of threads waiting to acquire locks.

""
openGauss=# SELECT * FROM pg_thread_wait_status WHERE wait_status = 'acquire lock';
Kill a system process.

Search for a system process that is running and run the following command to end the process:

""
ps ux
kill -9 pid
Collecting Event Statistics
Long-time running of SQL statements will occupy a lot of system resources. You can check event occurrence time and occupied memory to learn about database running status.

Query the time points about an event.

Run the following command to query the thread start time, transaction start time, SQL start time, and status change time of the event:

""
openGauss=# SELECT backend_start,xact_start,query_start,state_change FROM pg_stat_activity;
Query the number of sessions on the current server.

""
openGauss=# SELECT count(*) FROM pg_stat_activity;
Query system-level statistics.

Run the following command to query information about the session that uses the maximum memory:

""
openGauss=# SELECT * FROM pv_session_memory_detail() ORDER BY usedsize desc limit 10;
Checking Objects
Tables, indexes, partitions, and constraints are key storage objects of a database. A database administrator needs to routinely maintain key information and these objects.

View table details.

""
openGauss=# \d+ table_name
Query table statistics.

""
openGauss=# SELECT * FROM pg_statistic;
View index details.

""
openGauss=# \d+ index_name
Query partitioned table information.

""
openGauss=# SELECT * FROM pg_partition;
Collect statistics.

Run the ANALYZE statement to collect related statistics on the database.

Run the VACUUM statement to reclaim space and update statistics.

Query constraint information.

""
openGauss=# SELECT * FROM pg_constraint;
Checking an SQL Report
Run the EXPLAIN statement to view execution plans.

Backing Up Data
Never forget to back up data. During the routine work, the backup execution and backup data validity need to be checked to ensure data security and encryption security.

Export a specified user.

""
gs_dump dbname -p port -f out.sql -U user_name -W password
Export a schema.

""
gs_dump dbname -p port -n schema_name -f out.sql
Export a table.

""
gs_dump dbname -p port -t table_name -f out.sql
Checking Basic Information
Basic information includes versions, components, and patches. Periodic database information checks and records are important for database life cycle management.

Check version information.

""
openGauss=# SELECT version();
Check table size and database size.

""
openGauss=# SELECT pg_table_size('table_name');
openGauss=# SELECT pg_database_size('database_name');
Checking OS Parameters
Check Method
Use the gs_checkos tool provided by openGauss to check the OS status.

Prerequisites

The hardware and network are working properly.
The trust relationship of user root among the hosts is normal.
Only user root is authorized to run the gs_checkos command.
Procedure

Log in to a server as user root.

Run the following command to check OS parameters of servers where the openGauss nodes are deployed:

""
gs_checkos -i A
The purpose of checking the OS parameters is to ensure that openGauss is preinstalled properly and can žffic²žnìÃy operate after it is installed. For details about the check items, see “Server Tools > gs_checkos” in the Tool Reference.

Examples

Before running the gs_checkos command, execute pre-processing scripts by running gs_preinstall to prepare the environment. The following uses parameter A as an example:

""
gs_checkos -i A
Checking items:
A1. [ OS version status ] : Normal
A2. [ Kernel version status ] : Normal
A3. [ Unicode status ] : Normal
A4. [ Time zone status ] : Normal
A5. [ Swap memory status ] : Normal
A6. [ System control parameters status ] : Normal
A7. [ File system configuration status ] : Normal
A8. [ Disk configuration status ] : Normal
A9. [ Pre-read block size status ] : Normal
A10.[ IO scheduler status ] : Normal
A11.[ Network card configuration status ] : Normal
A12.[ Time consistency status ] : Warning
A13.[ Firewall service status ] : Normal
A14.[ THP service status ] : Normal
Total numbers:14. Abnormal numbers:0. Warning number:1.
The following uses parameter B as an example:

""
gs_checkos -i B
Setting items:
B1. [ Set system control parameters ] : Normal
B2. [ Set file system configuration value ] : Normal
B3. [ Set pre-read block size value ] : Normal
B4. [ Set IO scheduler value ] : Normal
B5. [ Set network card configuration value ] : Normal
B6. [ Set THP service ] : Normal
B7. [ Set RemoveIPC value ] : Normal
B8. [ Set Session Process ] : Normal
Total numbers:6. Abnormal numbers:0. Warning number:0.
Exception Handling
If you use the gs_checkos tool to check the OS and the command output shows Abnormal, run the following command to view detailed error information:

""
gs_checkos -i A --detail
The Abnormal state cannot be ignored because the OS in this state affects cluster installation. The Warning state does not affect cluster installation and thereby can be ignored.

If the check result for OS version status (A1) is Abnormal, replace OSs out of the mixed programming scope with those within the scope.

If the check result for kernel version status (A2) is Warning, the platform kernel versions in the cluster are inconsistent.

If the check result for Unicode status (A3) is Abnormal, set the same character set for all the hosts. You can add export LANG=unicode to the /etc/profile file.

""
vim /etc/profile
If the check result for time zone status (A4) is Abnormal, set the same time zone for all the hosts. You can copy the time zone file in the /usr/share/zoneinfo/ directory as the /etc/localtime file.

""
cp /usr/share/zoneinfo/$primary time zone/$secondary time zone /etc/localtime
If the check result for swap memory status (A5) is Abnormal, a possible cause is that the swap memory is larger than the physical memory. You can troubleshoot this issue by reducing the swap memory or increasing the physical memory.

If the check result for system control parameter status (A6) is Abnormal, troubleshoot this issue in either of the following two ways:

Run the following command:

""
gs_checkos -i B1
Modify the /etc/sysctl.conf file based on the error message and run sysctl -p to make it take effect.

""
vim /etc/sysctl.conf
If the check result for file system configuration status (A7) is Abnormal, run the following command to troubleshoot this issue:

""
gs_checkos -i B2
If the check result for disk configuration status (A8) is Abnormal, set the disk mounting format to rw,noatime,inode64,allocsize=16m.

Run the man mount command to mount the XFS parameter:

""
rw,noatime,inode64,allocsize=16m
You can also set the XFS parameter in the /etc/fstab file. For example:

""
/dev/data /data xfs rw,noatime,inode64,allocsize=16m 0 0
If the check result for pre-read block size status (A9) is Abnormal, run the following command to troubleshoot this issue:

""
gs_checkos -i B3
If the check result for I/O scheduling status (A10) is Abnormal, run the following command to troubleshoot this issue:

""
gs_checkos -i B4
If the check result for NIC configuration status (A11) is Warning, run the following command to troubleshoot this issue:

""
gs_checkos -i B5
If the check result for time consistency status (A12) is Abnormal, verify that the NTP service has been installed and started and has synchronized time from the NTP clock.

If the check result for firewall status (A13) is Abnormal, disable the firewall. Run the following commands:

SUSE:

""
SuSEfirewall2 stop
RedHat7:

""
systemctl disable firewalld
RedHat6:

""
service iptables stop
If the check result for THP service status (A14) is Abnormal, run the following command to troubleshoot this issue:

""
gs_checkos -i B6
Checking openGauss Health Status
Check Method
Use the gs_check tool provided by openGauss to check the openGauss health status.

Precautions

Only user root is authorized to check new nodes added during cluster scale-out. In other cases, the check can be performed only by user omm.
Parameter -i or -e must be set. -i specifies a single item to be checked, and -e specifies an inspection scenario where multiple items will be checked.
If -i is not set to a root item or no such items are contained in the check item list of the scenario specified by -e, you do not need to enter the name or password of user root.
You can run –skip-root-items to skip root items.
Check the consistency between the new node and existing nodes. Run the gs_check command on an existing node and specify the –hosts parameter. The IP address of the new node needs to be written into the hosts file.
Procedure

Method 1:

Log in as the OS user omm to the primary node of the database.

Run the following command to check the openGauss database status:

""
gs_check -i CheckClusterState
In the command, -i indicates the check item and is case-sensitive. The format is -i CheckClusterState, -i CheckCPU or -i CheckClusterState,CheckCPU.

Checkable items are listed in “Server Tools > gs_check > openGauss status checks” in the openGauss Tool Reference. You can create a check item as needed.

Method 2:

Log in as the OS user omm to the primary node of the database.

Run the following command to check the openGauss database health status:

""
gs_check -e inspect
In the command, -e indicates the inspection scenario and is case-sensitive. The format is -e inspect or -e upgrade.

The inspection scenarios include inspect (routine inspection), upgrade (inspection before upgrade), Install (install inspection ), binary_upgrade (inspection before in-place upgrade), slow_node (node inspection), longtime (time-consuming inspection) and health (health inspection). You can create an inspection scenario as needed.

The openGauss inspection is performed to check openGauss status during openGauss running or to check the environment and conditions before critical operations, such as upgrade or scale-out. For details about the inspection items and scenarios, see “Server Tools > gs_check > openGauss status checks” in the openGauss Tool Reference.

Examples

Check result of a single item:

""
perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU
Parsing the check items config file successfully
Distribute the context file to remote hosts successfully
Start to health check for the cluster. Total Items:1 Nodes:3

Checking... [=========================] 1/1
Start to analysis the check result
CheckCPU....................................OK
The item run on 3 nodes. success: 3

Analysis the check result successfully
Success. All check items run completed. Total:1 Success:1 Failed:0
For more information please refer to /opt/huawei/wisequery/script/gspylib/inspection/output/CheckReport_201902193704661604.tar.gz
Local execution result:

""
perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU -L

2017-12-29 17:09:29 [NAM] CheckCPU
2017-12-29 17:09:29 [STD] Check the CPU usage of the host. If the value of idle is greater than 30% and the value of iowait is less than 30%, this item passes the check. Otherwise, this item fails the check.
2017-12-29 17:09:29 [RST] OK

2017-12-29 17:09:29 [RAW]
Linux 4.4.21-69-default (lfgp000700749) 12/29/17 x86_64

17:09:24 CPU %user %nice %system %iowait %steal %idle
17:09:25 all 0.25 0.00 0.25 0.00 0.00 99.50
17:09:26 all 0.25 0.00 0.13 0.00 0.00 99.62
17:09:27 all 0.25 0.00 0.25 0.13 0.00 99.37
17:09:28 all 0.38 0.00 0.25 0.00 0.13 99.25
17:09:29 all 1.00 0.00 0.88 0.00 0.00 98.12
Average: all 0.43 0.00 0.35 0.03 0.03 99.17
Check result of a scenario:

""
[perfadm@SIA1000131072 Check]$ gs_check -e inspect
Parsing the check items config file successfully
The below items require root privileges to execute:[CheckBlockdev CheckIOrequestqueue CheckIOConfigure CheckCheckMultiQueue CheckFirewall CheckSshdService CheckSshdConfig CheckCrondService CheckBootItems CheckFilehandle CheckNICModel CheckDropCache]
Please enter root privileges user[root]:root
Please enter password for user[root]:
Please enter password for user[root] on the node[10.244.57.240]:
Check root password connection successfully
Distribute the context file to remote hosts successfully
Start to health check for the cluster. Total Items:59 Nodes:2

Checking... [ ] 21/57
Checking... [=========================] 59/57
Start to analysis the check result
CheckClusterState...........................OK
The item run on 2 nodes. success: 2

CheckDBParams...............................OK
The item run on 1 nodes. success: 1

CheckDebugSwitch............................OK
The item run on 2 nodes. success: 2

CheckDirPermissions.........................OK
The item run on 2 nodes. success: 2

CheckReadonlyMode...........................OK
The item run on 1 nodes. success: 1

CheckEnvProfile.............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
GAUSSHOME /usr1/gaussdb/app
LD_LIBRARY_PATH /usr1/gaussdb/app/lib
PATH /usr1/gaussdb/app/bin

CheckBlockdev...............................OK
The item run on 2 nodes. success: 2

CheckCurConnCount...........................OK
The item run on 1 nodes. success: 1

CheckCursorNum..............................OK
The item run on 1 nodes. success: 1

CheckPgxcgroup..............................OK
The item run on 1 nodes. success: 1

CheckDiskFormat.............................OK
The item run on 2 nodes. success: 2

CheckSpaceUsage.............................OK
The item run on 2 nodes. success: 2

CheckInodeUsage.............................OK
The item run on 2 nodes. success: 2

CheckSwapMemory.............................OK
The item run on 2 nodes. success: 2

CheckLogicalBlock...........................OK
The item run on 2 nodes. success: 2

CheckIOrequestqueue.....................WARNING
The item run on 2 nodes. warning: 2
The warning[host240,host157] value:
On device (vdb) 'IO Request' RealValue '256' ExpectedValue '32768'
On device (vda) 'IO Request' RealValue '256' ExpectedValue '32768'

CheckMaxAsyIOrequests.......................OK
The item run on 2 nodes. success: 2

CheckIOConfigure............................OK
The item run on 2 nodes. success: 2

CheckMTU....................................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
1500

CheckPing...................................OK
The item run on 2 nodes. success: 2

CheckRXTX...................................NG
The item run on 2 nodes. ng: 2
The ng[host240,host157] value:
NetWork[eth0]
RX: 256
TX: 256

CheckNetWorkDrop............................OK
The item run on 2 nodes. success: 2

CheckMultiQueue.............................OK
The item run on 2 nodes. success: 2

CheckEncoding...............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
LANG=en_US.UTF-8

CheckFirewall...............................OK
The item run on 2 nodes. success: 2

CheckKernelVer..............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
3.10.0-957.el7.x86_64

CheckMaxHandle..............................OK
The item run on 2 nodes. success: 2

CheckNTPD...................................OK
host240: NTPD service is running, 2020-06-02 17:00:28
host157: NTPD service is running, 2020-06-02 17:00:06

CheckOSVer..................................OK
host240: The current OS is centos 7.6 64bit.
host157: The current OS is centos 7.6 64bit.

CheckSysParams..........................WARNING
The item run on 2 nodes. warning: 2
The warning[host240,host157] value:
Warning reason: variable 'net.ipv4.tcp_retries1' RealValue '3' ExpectedValue '5'.
Warning reason: variable 'net.ipv4.tcp_syn_retries' RealValue '6' ExpectedValue '5'.

CheckTHP....................................OK
The item run on 2 nodes. success: 2

CheckTimeZone...............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
+0800

CheckCPU....................................OK
The item run on 2 nodes. success: 2

CheckSshdService............................OK
The item run on 2 nodes. success: 2

CheckSshdConfig.........................WARNING
The item run on 2 nodes. warning: 2
The warning[host240,host157] value:

Warning reason: UseDNS parameter is not set; expected: no

CheckCrondService...........................OK
The item run on 2 nodes. success: 2

CheckStack..................................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
8192

CheckSysPortRange...........................OK
The item run on 2 nodes. success: 2

CheckMemInfo................................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
totalMem: 31.260929107666016G

CheckHyperThread............................OK
The item run on 2 nodes. success: 2

CheckTableSpace.............................OK
The item run on 1 nodes. success: 1

CheckSysadminUser...........................OK
The item run on 1 nodes. success: 1

CheckGUCConsistent..........................OK
All DN instance guc value is consistent.

CheckMaxProcMemory..........................OK
The item run on 1 nodes. success: 1

CheckBootItems..............................OK
The item run on 2 nodes. success: 2

CheckHashIndex..............................OK
The item run on 1 nodes. success: 1

CheckPgxcRedistb............................OK
The item run on 1 nodes. success: 1

CheckNodeGroupName..........................OK
The item run on 1 nodes. success: 1

CheckTDDate.................................OK
The item run on 1 nodes. success: 1

CheckDilateSysTab...........................OK
The item run on 1 nodes. success: 1

CheckKeyProAdj..............................OK
The item run on 2 nodes. success: 2

CheckProStartTime.......................WARNING
host157:
STARTED COMMAND
Tue Jun 2 16:57:18 2020 /usr1/dmuser/dmserver/metricdb1/server/bin/gaussdb --single_node -D /usr1/dmuser/dmb1/data -p 22204
Mon Jun 1 16:15:15 2020 /usr1/gaussdb/app/bin/gaussdb -D /usr1/gaussdb/data/dn1 -M standby

CheckFilehandle.............................OK
The item run on 2 nodes. success: 2

CheckRouting................................OK
The item run on 2 nodes. success: 2

CheckNICModel...............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
version: 1.0.0
model: Red Hat, Inc. Virtio network device

CheckDropCache..........................WARNING
The item run on 2 nodes. warning: 2
The warning[host240,host157] value:
No DropCache process is running

CheckMpprcFile..............................NG
The item run on 2 nodes. ng: 2
The ng[host240,host157] value:
There is no mpprc file

Analysis the check result successfully
Failed. All check items run completed. Total:57 Success:50 Warning:5 NG:2
For more information please refer to /usr1/gaussdb/tool/script/gspylib/inspection/output/CheckReport_inspect611.tar.gz

Top comments (0)