DEV Community

Cong Li
Cong Li

Posted on

Deployment and Load Status Monitoring of GBFS Dedicated File Server for GBase 8a MPP Cluster Data Loading

Introduction

This article introduces the configuration of the GBFS dedicated file server for GBase 8a MPP Cluster data loading and the methods for monitoring load status.

Related References:

1. Introduction to GBFS Dedicated File Server

The GBFS dedicated file server is a binary executable program designed specifically for data loading in GBase 8a MPP Cluster databases. It is typically provided to users in the form of a package, such as gbfs-9.5.3.22-redhat7.3.tar.bz2. Users can simply extract this package using the following command and run it directly.

Using gbfs-9.5.3.22-redhat7.3.tar.bz2 as an example:

# tar xvf gbfs-9.5.3.22-redhat7.3.tar.bz2
Enter fullscreen mode Exit fullscreen mode

After extraction, a gbfs folder will be created in the current directory, containing the main gbfs program and BUILDINFO (compilation information). You can view the help information of the gbfs program using the command gbfs -?.

[root@rhel73-1 gbfs]# ./gbfs -?
./gbfs ver 9.5.3.22.126635 for unknown-linux-gnu on x86_64
Copyright 2004-2021 General Data Technology Co.Ltd.
GBase File Server
Usage: ./gbfs [OPTIONS]
-V, --version    Get version info.
-?, --help       Get help info.
-P, --port       Port number to use for connection or 6666 for default,
                 valid range: [1025,65535] order of preference.
-H, --home-dir   The GBase file server home dir, default: current user home dir.
-L, --log-dir    The GBase file server logs dir, default: /tmp/.
Enter fullscreen mode Exit fullscreen mode

The help information includes the version and usage of gbfs. The parameters are introduced as follows:

  • -P & --port: The port number for the GBFS server to listen on. The default is 6666.
  • -H & --home-dir: The GBFS home directory, similar to the FTP home directory feature. The default is the home directory of the current user starting the server. This parameter is mainly used to support the relative path feature of gbfs.

For example:

If the gbase user is running the server, the default GBFS home directory would be /home/gbase/. If the user data is stored under /home/gbase/data/, the user can use the following URL to load the file:

gbfs://192.168.146.20/data/test.tbl 
Enter fullscreen mode Exit fullscreen mode

The equivalent absolute path URL would be:

gbfs://192.168.146.20//home/gbase/data/test.tbl
Enter fullscreen mode Exit fullscreen mode

Users can configure this parameter according to their actual scenarios.

  • -L & --log-dir: The directory for storing GBFS log files. After the GBFS server starts, it will create a gbfs_port.log file in this directory. The default directory is /tmp/.

It is recommended to run the GBFS dedicated file server in the background:

[gbase@rhel73-1 gbfs]$ ./gbfs &
[1] 23302
[gbase@rhel73-1 gbfs]$ IPv6 is available.
gbfs is ready for connections. home dir:/home/gbase/, log dir:/tmp/, port:6666.
Enter fullscreen mode Exit fullscreen mode

Example:

To load the part.tbl file located on the GBFS server, using the default line delimiter and | as the column delimiter:

gbase> load data infile 'gbfs://192.168.146.20//opt/ssbm/part.tbl' into table part data_format 3 FIELDS TERMINATED BY '|';
Enter fullscreen mode Exit fullscreen mode

2. Load Status Monitoring

Function Description

After the load task starts, you can check the status information of the current load task through SQL.

Syntax Format

SELECT * FROM information_schema.load_status;
Enter fullscreen mode Exit fullscreen mode

The status information table records the status information of all running load tasks.

Image description

Each field in the status information table is defined as follows:

Field Name Description
SCN SCN number
DB_NAME Database name
TB_NAME Table name
IP Load machine IP address
STATE Load state
START_TIME Load start time
ELAPSED_TIME Load end time
AVG_SPEED Average load speed
PROGRESS Load progress
TOTAL_SIZE Total file size
LOADED_SIZE Amount of data loaded
LOADED_RECORDS Number of records loaded
SKIPPED_RECORDS Number of records skipped
DATA_SOURCE Data source
SQL_CMD Load task SQL

That's all for today. Thank you for reading!

Top comments (0)