DEV Community

Cover image for How to import bulk data to Amazon RDS for MySQL with best approaches
Sidra Saleem for SUDO Consultants

Posted on • Updated on • Originally published at sudoconsultants.com

How to import bulk data to Amazon RDS for MySQL with best approaches

Optimized loading or importing of Data

If your database does not have automated backups enabled, you can still take regular snapshots of your entire database. For this purpose, Set the backup retention period to a positive nonzero value. Once you do so, you can create an automatic backup by taking your database offline. A backup is immediately created whenever the backup retention period changes from zero to a nonzero value or vice versa. A manual backup is performed using the AWS CLI, or the backup is scheduled using a cron job. Manual and cron-scheduled backups are not subject to the backup retention period. If you want to see more details, check out our latest article on backup restoration in Amazon RDS for MySQL.

Storage Optimization 

Use Amazon Cloudwatch for monitoring :

You can use the FreeStorageSpace Aws CloudWatch metric to monitor the amount of space available to an RDS DB instance. Still, that metric doesn't describe how the DB instance consumes storage.

Reduce/disable general log and slow query log storage

Enabling the general log and slow query log parameters can cause your DB instance to start storing these logs and the backups of these logs. To control disk space usage, set rds.log_rotate_on_compress.To rotate these logs and manage disk space, visit MySQL database directories of general log rotation and slow log rotation to see which log files can be rotated.

For detailed storage optimization tips, you can visit here.

Modifications of Database Parameters

You can adjust the database parameters to improve performance while loading data into an RDS for MySQL instance. Several of these parameters can be changed via parameter groups, and several can be modified at the session level.

After the data import activity is complete, undo the changes you made to the following parameters. The suggested values might not be appropriate when no ongoing data import activity exists. 

You can only modify RDS DB instance parameters if you have access to the parameter group in which they are stored. To change any parameter value, you must first look at the parameter group that stores those parameters and then change the parameter value in the appropriate field.

Step 1:

A DB parameter group is created by using the ParameterGroup DB object. You use this object to create a parameter group in your database. 

Step 2 :

You then view the values of the parameters in a parameter group to ensure that the Is Modifiable property is set to True. 

Step 3 :

Finally, you modify the parameter values in the parameter group.

You have to keep in mind the following configurations as well :

Batch Commits :

All user activity in InnoDB takes place within a transaction. This means that each SQL statement executed is treated as a separate transaction. In MySQL, all transactions are automatically committed if they do not generate any errors, which may cause a delay during the import operation. To speed up the import operation, it is recommended to set autocommit=1 and use binary log switching. This will ensure that the database server does not wait for the end of the current transaction before processing the next one. You can study batch execute statements here.

For inserting more than one row in a table, use INSERT statements. This way, you can insert several rows of data at once. You need only supply the information to insert in the appropriate columns of the rows being inserted. Here is an example syntax :

INSERT INTO <tablename> VALUES (value1, value2), (value3, value4),…..;

Batching flat files with LOAD DATA

LOAD DATA allows you to load flat files into the database. Batch loading of data from files is possible. Local files are helpful when working on a particular table in the database. These flat files are created per table, which directly helps in parallel data loading to the database. For further details on load data files, you can visit here.

Disable autocommit 

Whenever you change the value for auto-commit by making changes in the custom parameter group that associates it with the RDS instance, these changes occur at a global level. You can instead change this value at the session level. You can surround the SQL import statements with the following code:

SET autocommit=0; 

... SQL import statements ... 

SET autocommit=1;

In all these scenarios, the number of commits remains unaffected. Even though this can be beneficial, avoiding making transactions too big is essential. If the transaction size exceeds 100 MB, you must split it into 250 MB chunks and load them sequentially. This will allow you to minimize the rollback time if there is any issue in any chunks.

Increase the size of innodb_log_file_size

InnoDB log_file_size defines the checkpointing age. Large redo logs cause a lot of disk writes. Therefore, increasing the size of the log file leads to less disk I/O.

The default value of innodb_log_file_size for Amazon RDS for MySQL is 128 MiB.

The maximum log file size should be equal to about 524,288 bytes for your server. This value is computed from a constant value expressed in GB, multiplied by the product of innodb_log_file_size and innodb_log_files_in_group. More than that, the combined size of log files can't exceed 512 GB. If it reaches that value, you have run out of space on the redo log. Consequently, it can't smooth out peaks and troughs in workload activity, which often means there is not enough redo log space to handle more than an hour of write activity. Therefore, starting with a relatively large log file is advisable to ensure a smooth and successful server operation. 

For parameter modification, you can visit the detailed guide on Parameter configuration.

Increase the size of innodb_log_buffer_size

While importing data to the database, the transaction inserts many rows. If you set the log buffer size to a higher value, then it decreases the number of writes to the logs. This, in turn, decreases the log I/O and helps improve the data import performance.

Innodb_log_buffer_size is a static parameter, so a reboot is required to modify the value. You can configure the value of this parameter using the custom parameter group associated with the RDS instance. Doing this allows you to change the parameter's value without requiring a reboot. Doing this helps keep the write IOPS from becoming a bottleneck and improves the data import performance.

You can find the elaborated article on this site for best practices to import data on amazon RDS for MySQL.

Top comments (0)