DEV Community

H-Philosophy
H-Philosophy

Posted on

3 Tips to Dramatically Improve S3 Transfer Speeds

Many application scenarios, such as Machine Learning, Data Analytics, HPC (High-Performance Computing), and Backup/Restore, require downloading a large amount of data for local computing or application. Time is money. You have a 450 GB file that needs to transfer from S3 to EC2, and you always want to spend as little time as possible, right? I have recently moved large files from S3 to EC2 so I will record and share them here.

Transferring S3 to EC2 can be implemented using CLI or SDK. Since the case handled uses CLI, this article focuses on CLI, but the core setting concepts are the same.

Before we start, it is still stated that the purpose of this article is not to squeeze out the most extreme performance but to follow the 80/20 rule and pay attention to/adjust some settings to get a noticeable performance improvement.

Tip 1. Use the latest version of CLI and enable CRT

You may not like to update the CLI, but from the Release Note[1], you can see that the CLI has been improving its performance. In addition, Common Runtime (CRT) libraries have also been added after 2.2.0 to improve the overall transmission performance. If you want more details, you can watch the documentation [2]. Or go directly to the settings below

After updating to the latest version of the CLI, enabling the CRT [3] using the following command will improve the overall transfer experience.

aws configure set default.s3.preferred_transfer_client crt
Enter fullscreen mode Exit fullscreen mode

[1] https://github.com/aws/aws-cli/blob/2.8.8/CHANGELOG.rst

[2] https://docs.aws.amazon.com/sdkref/latest/guide/common-runtime.html

[3] https://awscli.amazonaws.com/v2/documentation/api/latest/topic/s3-config.html#preferred-transfer-client

Tip 2. Increase the Thread number and multipart chunksize

Increasing the number of Threads [4] can squeeze out the network transmission performance, but you should note that increasing the number of Threads will increase the load on EC2. If EC2 still runs other applications, it is necessary to test to see the number's boundary value. , so as not to cause resource crowding out.

aws configure set default.s3.max_concurrent_requests 100
Enter fullscreen mode Exit fullscreen mode

When AWS CLI uses S3 command transmission, it will perform multipart [5] transmission by default. However, if each small file size increases, AWS CLI can reduce the cost of communication and combination.

aws configure set default.s3.multipart_chunksize 16MB
Enter fullscreen mode Exit fullscreen mode

[4] https://awscli.amazonaws.com/v2/documentation/api/latest/topic/s3-config.html#max-concurrent-requests

[5] https://awscli.amazonaws.com/v2/documentation/api/latest/topic/s3-config.html#multipart-chunksize

Tip 3. Confirm the EC2 and EBS types used

The network bandwidth of each EC2 model [6] on AWS is different. Therefore, if you want to pursue a higher bandwidth, it is recommended to replace the EC2 type to meet the needs.

After talking about network bandwidth, I will speak about EBS when the data is stored. Taking EC2 and EBS [7] as an example, if the writing speed can't keep up, it will affect the download speed. Of course, if it is not in a state of comparison, you can use gp3 to test first if you need to make adjustments.

Extending the same concept, even if you are not using EC2 but your own VM, it depends on whether your external network bandwidth or Block Storage is potentially poor.

[6] https://aws.amazon.com/ec2/instance-types/

[7] https://aws.amazon.com/ebs/volume-types/

Summarize and test

Talking about so many little tricks, it seems unreasonable to have no corresponding data explanation.

The experiment is that I downloaded a 451 GB file to EC2 with Windows 2022 in Tokyo. The variable is to use different CLI versions (2.0.52, 2.7.32, 2.7.32[crt])

2.0.52

2.7.32

2.7.32 crt

From the test of 2.7.32 [crt] compared to 2.0.52, we can see a gap of 33 MiB/s in transmission speed, and the time is shortened by nearly 12 minutes. The point is that this test does not adjust max_concurrent_requests and multipart_chunksize, so there are still improvements.

The above tips and tests will be helpful to those who have related needs.

Chinese version: 3 個小技巧,大幅改善 S3 傳輸速度

Top comments (0)