DEV Community

Man yin Mandy Wong for Tencent Cloud

Posted on

ARM-Based Server Review

1. Background

Take a look at the ARM-based server SR1 recently launched by Tencent Cloud. Is it worth it? How does it stack up against other models? Let's check it out.

We have reviewed two typical models of the ARM-based SR1 and x86-based S5 to show you how to measure CPU performance, mainly computing power, so that you can quickly know what you should be looking for.

2. ARM-based server environment and evaluation preparations

Tencent Cloud SR1 is the first ARM-based server with the latest Ampere Altra, an ARM Neoverse N1 CPU with up to 2.8 GHz clock rate and 64 KiB L1 cache. The Neoverse N1 CPU has the following architecture:

Image description

The other object is the mainstream x86-based standard S5, which adopts the latest Cooper Lake microarchitecture of Intel Xeon Platinum and runs at 2.5 GHz. It's quite popular in general use cases. By the way, both of the test objects accommodate 4-core 8 GiB memory.

From the cost perspective, SR1 is approximately 20% cheaper than S5 as indicated at the official website. Although it doesn't have a price as competitive as Lighthouse, it is really worth it.

Image description

1.1 ARM-based server activation

S5 and SR1 price comparison

SR1 is comparable to S5 in terms of overall performance and more economical than the latter, a must-have that promises a large amount of cost savings for both individuals and enterprises.

Tips: Screen splitting

Use the Tmux tool to split the screen (ctrl b), log in to two servers at the same time, and enter the ctrl b:setw synchronize-panes command to allow for entering commands on two terminals at the same time, as shown below:

Image description

2.1 System preparations and CPU viewing

Enter commands in different windows of Tmux.

Done with the preparations and let's start the evaluation.

3. 7-Zip compression evaluation

7-Zip is built with the LZMA compression tool to quickly evaluate the CPU computing performance of servers.

Image description

Run the following command to evaluate the performance:

Image description

2.6 LZMA compression evaluation (ARM-based SR1/x86-based S5)

7-Zip evaluation

The 7-Zip benchmark command can be used to display the compression and decompression performance of a server, with a measure of million instructions per second (MIPS). The higher the value, the stronger the performance. You can also use metrics such as compression rate and execution time for coordinated verification. 7-Zip evaluation rarely uses 64-bit instructions, let alone advanced sets; it's more about the performance of CPU "fundamentals". LZMA compression performance relies on the memory access latency, high-speed data cache (D-Cache) capacity, TLB performance, and out-of-order execution efficiency of a CPU; while the decompression performance reveals more about the branch prediction and instruction latency of the multi-stage pipeline design.

Evaluation results:

Image description

2.2 LZMA compression evaluation

7-Zip evaluation of S5 and SR1

As you can see, ARM-based SR1 delivers 60% higher performance than x86-based S5 in LZMA compression and decompression scenarios.

4. LUKS block device encryption and decryption evaluation

LUKS is a specification for block device encryption supported by the Linux kernel. Simply put, it encrypts disks.

Similar to file compression and decompression, block device encryption and decryption are typical applications that consume a lot of computing resources. Unlike generic computing scenarios, encryption and decryption computing instructions are usually implemented with special hardware to serve as CPU extension sets. The x86 system adopts the AES-NI extension, and ARM differentiates extensions for varied encryption and decryption scenarios.

There is no need to install any other software. Just use the cryptsetup tool that comes with Linux to evaluate the CPU performance through encryption and decryption algorithms:

Image description

By default, the command evaluates tasks of ciphers and key derivation functions (KDFs).

Run the following command to evaluate the performance:

Image description

2.3 LUKS encryption evaluation (ARM-based SR1/x86-based S5)

LUKS evaluation process

Evaluation results (KDFs):

Image description

2.3 LUKS encryption evaluation

LUKS evaluation of S5 and SR1 in terms of KDFs

Evaluation results (ciphers):

Image description

2.3 LUKS encryption evaluation (ARM-based SR1/x86-based S5)

LUKS evaluation of S5 and SR1 in terms of encryption algorithms

As you can see, the ARM-based server outperforms its x86-based counterpart in terms of the optimization of common SHA instructions (SHA-256 and SHA-512) and AES-CBC encryption; while in terms of decryption and XTS encryption with the highest security, the x86-based server (AES-NI extension instruction) does a better job.

5. OpenSSL network encryption and decryption evaluation

Block device encryption uses data at rest, while network encryption involves data in transit. As OpenSSL is one of the most popular network encryption libraries, it's necessary to conduct an OpenSSL performance evaluation.

OpenSSL's speed sub-command can be used to evaluate all the encryption algorithms, which takes a long time. Generally speaking, you can use parameters to specify algorithms. Commonly used algorithms are Hash-based Message Authentication Code (HMAC) for encrypted information integrity and identity verification, SHA-256 secure hash for information digest and digital signature, and standard encryption algorithm of AES-256 widely adopted by cloud service providers.

Image description

Run the following command to evaluate the performance:

Image description

2.4 OpenSSL encryption evaluation (ARM-based SR1/x86-based S5)

OpenSSL encryption process through speed

Evaluation results:

Image description

2.4 OpenSSL encryption evaluation

OpenSSL encryption results of S5 and SR1

As you can see, the ARM-based server slightly lags behind the x86-based server in terms of MD5 HMAC, but it outperforms the latter in terms of SHA-256 and AES-256, especially in the former case.

6. Redis database throughput rate evaluation

Now let's move to Redis performance evaluation. As one of the most popular memory databases, Redis is often used for key-value storage, data cache, and message queue scenarios with a high throughput rate. Redis also has a built-in evaluation utility called redis-benchmark to measure the number of requests per second.

Image description

The redis-benchmark program evaluates the throughput rate of a single server during the tests of GET, SET, LPUSH, and other common Redis commands, looking into the CPU and its memory access capabilities (such as memory access bandwidth and performance).

Run the following command to evaluate the performance:

Image description

2.6 Throughput evaluation (ARM-based SR1/x86-based S5)

Redis evaluation command execution

Evaluation results:

Image description

2.6 Throughput evaluation

Redis throughput rate evaluation of S5 and SR1

According to the Redis evaluation results, ARM-based SR1 has 30% to 40% higher performance on average than x86-based S5.

7. Conclusion

Now it's time you get some hands-on experience and see what your cloud server performance test would reveal.

Actually, ARM-based servers are more than cost-effective. As ARM platform-based virtualization technologies become popularized in the cloud, ARM-based servers are bound to gain more momentum in IoT, cloud phone/gaming, Android ecosystem, and many more use cases.

Let's look forward to more diversified experiences available at our fingertips.

Top comments (0)