We run several Java services (Corretto JDK21) on AWS Elastic Container Service(ECS) Fargate. Each service has its own container and we want to use all possible resources we are paying for each process. But the steps can be applied to EC2 and other clouds.
Services are running batch jobs and latency is not important, we use Parallel GC (-XX:+UseParallelGC
). Maybe G1 would be better even with our tasks, but it’s a topic for separate research and post.
To use all available memory we MaxHeapSize
a bit lower than container memory size. But after some time we noticed two problems, sometimes our containers were killed because they use too much memory and sometimes we received OutOfMemoryError
exceptions. To fix the first we increased the gap between container memory size and MaxHeapSize
and for the second increased containers’ memory as a quick fix and started to look at heap dumps.
Heap dumps showed interesting detail, actual heap size was lower than MaxHeapSize
, and the Young Generation heap was tiny compared to Old Generation.
Searching on the internet didn’t help to find a good guide on how to tune JVM parameters for our case, I only found some high level details about heaps and parameters descriptions. I decided to write this post to describe steps that I did.
The first steps were:
- print information about parameters and default values: (
-XX:+PrintFlagsFinal
), - set
InitialHeapSize
to the same value asMaxHeapSize
(-XX:InitialRAMPercentage=100
or just set-XX:InitialHeapSize
to the same value asMaxHeapSize
). We are paying for all container memory anyway, so why not allocate it from the start? - log GC and heap information (
-Xlog:gc*
).
The default rate for Young:Old generations is 1:2, and only part of the young generation is used at the same time to perform GC. And after start JVM allocated all memory as expected, but after some time it started to decrease Young Generation heap size almost to several megabytes. So after some time we used just ⅔ of available memory.
After some digging I found a parameter to disable Adaptive Policy (-XX:-UseAdaptiveSizePolicy
) and it helped, heap stopped decreasing and intervals between garbage collections increased by an order of magnitude or even more. Time consumed by GC grew as well but not so much.
The next step was to find the optimal gap between container memory size. By default, even if InitialRAMPercentage=100
, JDK just allocates memory and doesn’t use it so it’s not mapped. Linux allows it to allocate more virtual memory than it has physical memory. And the container fails later when memory is actually mapped (JDK writes to it). -XX:+AlwaysPreTouch changes this behavior. Unfortunately some memory is still not mapped but OOM termination happens much faster. After several attempts I ended with the next formula Container Memory Size - 1024MB
for containers with 8GB of memory or more. For example, for an 8192 container memory size we use -XX:MaxHeapSize=7168m
.
For further optimizations we are thinking about changing -XX:NewRatio
to decrease Young Generation size and reduce GC time. But it depends on the object's lifetime in the application.
As I mentioned before I haven’t found any good guide with detailed explanation of parameters (the best that I found is vm-options-explorer) and tuning steps. It would be great if you can share your knowledge and results.
Top comments (0)