Welcome to the guide on optimizing Java performance on ARM-based processors like Ampere Altra!
In this post, I will cover the benefits of using the latest JDKs and how they can improve the performance of your Java applications on ARM systems.
I will also discuss the specific tools and dependency considerations needed when developing Java web applications in an AArch64 environment.
You will be equipped with the information and tools necessary to efficiently develop and deploy Java applications on ARM-based platforms by the end of this post.
It's advised to start all new Java projects using the latest JDK version. If you're planning to deploy on ARM-based processors, it becomes even more important to do so. This is because the earlier Java versions and native Java code were written keeping x86 architecture in mind.
Although Java applications are easily portable among platforms, the latest JDKs include changes made specifically to make the application more performant on ARM systems.
The most important example is Java 11, which included changes made in ARM-based JDK to improve the performance of arrays, strings, and mathematical functions.
Web development with Java requires a large number of technology decisions. There could be dozens of tool choices required while creating a working application. However, there are a variety of options and open-source tools that support Java applications.
AArch64 architecture could be new for some of these ever-growing sets of choices. In this regard, it becomes important to understand our tools and dependency choices.
Here are a few reasons to analyze tools from an ARM perspective
Application may not run on ARM - Some applications could be completely unusable on ARM architectures, and their alternatives will need to be used. For example, (at the time of writing) Virtual Box is not supported on ARM, but its alternative, VMware's ESXi-Arm Fling is.
Application version may not support ARM - This can be a point of concern for older applications. Such applications will require a version upgrade and corresponding changes in dependent services (if any). For example, Redis was not supported on ARM architecture before version 4.0.0 (2015).
Applications optimized for ARM - Widely used tools, like Java itself, are continuously acknowledging ARM as the future and making improvements to make better use of the benefits of ARM. A tool that wasn't the best on x86 may outperform its competitors when run on Ampere Altra instances in terms of features and performance.
Containerization is a popular way to deploy Java applications, and Docker is the de-facto way to do that. Two approaches can be adapted to deliver Java containers on Ampere Altra.
The straightforward way is to build the images on an ARM machine - for example, a CI/CD server deployed on an Ampere Altra instance or an Apple M1 machine being used by a developer. The steps to build the image remain the same as for any other Java container.
If we are migrating an existing application to ARM, the important point to remember is that the base images for all our containers should be available on ARM 64. If not, the images need to be rebuilt with a compatible base.
This is not a problem when using popular Linux base images, but in one-off cases, the base may not be compliant. For example, at the time of writing, no official archlinux images exist for ARM 64.
Similarly, older versions of base images may not have ARM support and will need a version upgrade.
While we could keep our complete stack on ARM-based systems, it may not be completely feasible to do so in certain scenarios:
Developer machines may run on x86, and using ARM images on them will not be possible.
Every cloud provider or certain types of cloud services may not be able to run ARM-based containers.
It may be required to run different architecture instances on different environments.
In this case, it becomes necessary to build docker images for multiple environments. This brings us to Docker's
buildx command, which was introduced in Docker 19.03.
buildx, it is possible to create multi-architecture images on any machine. Let's have a look at how we can use it to differ from the normal build process.
Firstly, we need to install buildx on the instance running Docker. Once installed, we can use the command using the below syntax:
docker buildx build --platform linux/amd64,linux/arm64 -t [repository]/[image-name] [dockerfile path]
This command can replace the normal
docker build command. This makes it easy to integrate multi-architecture image creation in CI/CD pipelines.
JVM stores its bytecode compiled as native code in an area called code cache. This cache is instrumental in improving the performance of the Just-in-time(JIT) compiler.
Let's look at some JVM options which can vary the way your Java application utilizes code cache.
Keep in mind that none of these options is mandatory, nor will they be suitable for every possible application. If using these options, it is advisable to test out and benchmark your application with and without the options.
By default, the code cache size increases as needed and grows without restraint as long as physical memory is available. The default cache size is 160KB (which could vary among JDK implementations).
For applications that use a large code to run and support multiple features, it may be better to start with a larger cache size. This is important as acquiring cache size is overhead and should be avoided as much as possible.
-XX: InitialCodeCacheSize can be used to define the initial cache size.
E.g., to increase the size to 64MB, we can use the option
If the goal is just to avoid the overhead of acquiring cache size, we could also use the option -XX: CodeCacheExpansionSize to increase the amount by which the code cache is expanded every time it fills up.
Restricting code cache size helps predict the cache size occupied and restricts unnecessary RAM usage by temporary dynamic code generation.
To restrict the cache size, we can use the option -XX: ReservedCodeCacheSize. For example, to restrict the cache size to 64MB, we can use the option
Additionally, if there is a chance of the cache getting filled, it is important to set the
-XX:+UseCodeCacheFlushingoption as well.
With this setup, when the cache is about to fill up, it will evict code that is not being used frequently. If the cache crosses the reserved capacity, it will be flushed entirely.
This eviction could be useful for the following:
Applications that require a lot of code to run on startup, but it is not needed later.
Dynamic code blocks are generated for certain scenarios but are not frequently used.
To determine what is the optimal amount of cache size, we can measure the cache size of our application under the usual load. To do this, we can use the –XX:+PrintCodeCache Size option when running your application. It will provide the below values when the application exits:
bash CodeCache: size=32768Kb used=542Kb max_used=542Kb free=32226Kb bounds [0xb414a000, 0xb41d2000, 0xb614a000] total_blobs=131 nmethods=5 adapters=63 compilation: enabled
The most important detail here is the
max_used values. An optimal cache size is likely to be around the used value. Another option is –XX:+PrintCodeCacheOnCompilation, which will print the same details every time a new method is compiled and added to the cache.
Benchmarking is an essential process to carry out when making critical hardware, software, or algorithm choice. Concerning Altra, there can be a few ways in which benchmarking can be used.
When migrating from one architecture to another or choosing between them, it will be helpful to compare performance on both types of machines.
Below are a few examples of performance comparisons (or individual tests) that can be made
Throughput - making a high number of requests to both systems and comparing the time required to process all of them.
Resource Utilization - When both systems are put under the same amount of load, what are the hardware usage statistics - for example, CPU and memory utilization?
Stress Test - Up to what limit of concurrent requests can both systems handle?
Apart from overall system benchmarking, we can also measure the performance of small pieces of the application code. This is called micro benchmarking.
This is more useful in making low-level choices.
For example, a piece of code can use two approaches to execute the same task. We can run a certain number of tests for both approaches and compare results.
Micro benchmarking can help us decide which algorithms, data structures, or external libraries to choose for specific use cases.
Ampere Altra is a powerful machine that can be used to run a wide variety of workloads. It is important to understand the hardware and software choices that can be made to get the best performance out of it.
This article covered some of the important points to keep in mind when using Altra for Java applications and how to make the most of it.
Thank you for reading this. If you found this blog post useful, please forward it to your friends and colleagues who may find it useful as well.
Keep writing, and keep sharing knowledge on internet🤩