DEV Community

Cover image for Improve data compression performance on AWS Graviton processors
Jason Andrews for AWS Community Builders

Posted on

Improve data compression performance on AWS Graviton processors

Applications in many programming languages perform data compression. They commonly rely on zlib for easy handling of gzip files. This article explains how to improve performance of applications using zlib on AWS Graviton processors.

Most Linux distributions use zlib without any optimizations. For the Arm architecture, this means that CRC (cyclic redundancy check) instructions are not utilized for best performance. Installing and using a zlib which has been optimized may provide performance improvement for applications doing data compression. Let's see how to do it with an example Python application.

Cloudflare zlib is one version which has optimizations included. There are other zlib versions which have been optimized. The process to use them should be similar.

This can be done on any Graviton-based instance. I did it with Ubuntu 22.04.

Confirm crc32 is included in the processor flags

All AWS Graviton processors and most Armv8.0-A and above processors have support for CRC instructions.

To check if a Linux system has support, use the lscpu command and look for crc32 in the listed flags.

lscpu | grep crc32
Enter fullscreen mode Exit fullscreen mode

If the machine is confirmed to include crc32 it may benefit from zlib-cloudflare.

Check if the default zlib includes crc32 instructions

Some Linux systems may already make use of crc32 in the default library. If the default zlib is already optimized, then using zlib-cloudflare may not have any impact on performance.

Ubuntu and Debian Linux distributions put zlib in /usr/lib/aarch64-linux-gnu

Other software tools are needed to build zlib, so install them now.

sudo apt install -y build-essential
Enter fullscreen mode Exit fullscreen mode

To check if there are any CRC instructions in a library, use objdump to disassemble and look for crc32 instructions.

objdump -d /usr/lib/aarch64-linux-gnu/libz.so.1 | awk -F" " '{print $3}' | grep crc32 | wc -l
Enter fullscreen mode Exit fullscreen mode

If the result is 0 then there are no crc32 instructions used in the library.

Install Cloudflare zlib

If there are no crc32 instructions in zlib then zlib-cloudflare may help application performance.

To build and install zlib-cloudflare navigate to an empty directory and use these commands.

mkdir tmp ; pushd tmp
git clone https://github.com/cloudflare/zlib.git
cd zlib && ./configure 
make && sudo make install
popd
rm -rf tmp
Enter fullscreen mode Exit fullscreen mode

If successful, zlib-cloudflare is installed in /usr/local/lib

Confirm the new zlib has crc32 instructions. The objdump command should return a non-zero number now.

objdump -d /usr/local/lib/libz.so  | awk -F" " '{print $3}' | grep crc32 | wc -l
Enter fullscreen mode Exit fullscreen mode

To install zlib somewhere else, use the prefix argument to select another location.

./configure --prefix=$HOME/zlib
Enter fullscreen mode Exit fullscreen mode

This results in zlib being installed in $HOME/zlib instead.

Configuring zlib

Below is a simple C program to demonstrate zlib usage.

#include <stdio.h>
#include <stdlib.h>
#include "zlib.h"

int main()
{

    gzFile myfile;

    printf("%s\n", zlibVersion());

    myfile = gzopen("testfile.gz", "wb");

    gzprintf(myfile,"Hello gzipped file!\n");

    gzclose(myfile);

    exit(0);
}
Enter fullscreen mode Exit fullscreen mode

Save the text above as a file named test.c and compile the example.

gcc test.c -o test -lz
Enter fullscreen mode Exit fullscreen mode

Run the program and see the version.

./test
Enter fullscreen mode Exit fullscreen mode

The printed version will be a number such as:

1.2.11
Enter fullscreen mode Exit fullscreen mode

Use ldd to see the location of the shared library.

ldd ./test
Enter fullscreen mode Exit fullscreen mode

The output shows the shared libraries used by test.

linux-vdso.so.1 (0x0000ffff91026000)
libz.so.1 => /lib/aarch64-linux-gnu/libz.so.1 (0x0000ffff90fa0000)
libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffff90df0000)
/lib/ld-linux-aarch64.so.1 (0x0000ffff90fed000)
Enter fullscreen mode Exit fullscreen mode

Set LD_PRELOAD to use zlib-cloudflare

To run test with zlib-cloudflare instead of the default.

LD_PRELOAD=/usr/local/lib/libz.so ./test
Enter fullscreen mode Exit fullscreen mode

The LD_PRELOAD variable informs the linker to use these libraries before the default libraries.

The version of zlib-cloudflare will be printed. It may be older than the default, but we are interested in crc32 and not using the latest.

Next, let's see how to use zlib-cloudflare in an application doing data compression. We can use a Python example and measure the performance difference with zlib-cloudflare.

Copy and save the file below as zip.py

import gzip

size = 16384

with open('largefile', 'rb') as f_in:
    with gzip.open('largefile.gz', 'wb') as f_out:
        while (data := f_in.read(size)):
            f_out.write(data)

f_out.close()
Enter fullscreen mode Exit fullscreen mode

For Ubuntu 22.04, configure python to be python3.

sudo apt install python-is-python3 -y
Enter fullscreen mode Exit fullscreen mode

Create a large file to compress

The above Python code will read a file named largefile and write a compressed version as largefile.gz

To create the input file, use the dd command.

dd if=/dev/zero of=largefile count=1M bs=1024
Enter fullscreen mode Exit fullscreen mode

Run the example using the default zlib

Run with the default zlib and time the execution.

time python ./zip.py
Enter fullscreen mode Exit fullscreen mode

Make a note of the runtime.

Run the example again with zlib-cloudflare

This time, use LD_PRELOAD to change to zlib-cloudflare and check the performance difference.

Adjust the path to libz.so as needed.

time LD_PRELOAD=/usr/local/lib/libz.so python ./zip.py
Enter fullscreen mode Exit fullscreen mode

Notice the shorter runtime when zlib-cloudflare is used.

Using a c6g.large EC2 instance, the time with the original zlib is about 7.25 seconds and with zlib-cloudflare the time is about 2.66 seconds.

Summary

If you have applications using zlib make sure to check alternative versions of the library. Cloudflare zlib is a good one, and there may be others available. Watch the AWS Graviton Getting Started for the latest information.

Top comments (0)