DEV Community

Cover image for Off-heap memory in Java
Jeisson Florez
Jeisson Florez

Posted on • Updated on

Off-heap memory in Java

The heap area is one of the most important parts in the JVM architecture since it stores all the objects created in a JVM instance, however, there are some cases when it is convenient to put them outside of it. In this post we will see how this can be achieved and some implementations in this regard.

Introduction

First, let's take a quick look at the JVM architecture.

Alt Text

As we can see the heap is into the Runtime Data Area which contains the areas that are used during the execution of a program, some of them are per thread and others are unique by JVM instance such as the heap. The garbage collector must be taken into account because it is key to understand how memory is managed in the heap.

A formal definition of the heap area is:

The Java Virtual Machine has a heap that is shared among all Java Virtual Machine threads. The heap is the run-time data area from which memory for all class instances and arrays is allocated. The heap is created on virtual machine start-up. Heap storage for objects is reclaimed by an automatic storage management system (known as a garbage collector); objects are never explicitly deallocated. The Java Virtual Machine assumes no particular type of automatic storage management system, and the storage management technique may be chosen according to the implementor’s system requirements. The heap may be of a fixed size or may be expanded as required by the computation and may be contracted if a larger heap becomes unnecessary. The memory for the heap does not need to be contiguous. JVM Specification

Knowing that all the data stored in the heap is subject to the garbage collector, if the data stored in the heap becomes huge, the time consumed by the garbage collector will be proportionally higher, and here we may be wondering why this may affect us? Well, each garbage collector has a different method to do the heap cleanup but they all have something in common, the Stop-The-World mechanism, which means that at some point all the application threads will be suspended until the garbage collector processes all the objects in the heap.

That said, while the garbage collector algorithms do a great job of cleaning up in super fast time, when we are dealing with near real-time applications we don't have the option of having these pauses, or when the available physical memory is less than needed then that is when dumping that data off the heap is an option.

Off-heap memory

Off-heap memory refers to the memory allocated directly to the operative system, it can be part of the same physical memory or/and disk access based such as memory mapped-files. As putting data out of the JVM, serialization is needed to write and read that data, and the performance will depend on the buffer, serialization process and disk speed (if applicable).

Benefits

  • Reduction of garbage collection pressure.
  • Large memory size, depending on the implementation.
  • Memory shared among all JVMs present in the OS.

Considerations

  • Serialization process impact on the performance
  • Manual memory management is hard and error-prone (ask to C devs 😅).

Usage

The way to use off-heap memory depends on the developers and the business case, either creating an own implementation using Java NIO API that allow us to allocate memory manually or using any of the implementations already in the market.

In this post we will see in a general overview a library that implements some of the most common data structures used in java.

  • Chronicle-Map: Chronicle Map is an in-memory, key-value store, designed for low-latency, and/or multi-process applications.

Let's do a simple test in which the scenario is an application that processes a few million numbers and put them in a Set data structure to sum them up afterwards.

  • Project repo: GitHub - off-heap-tests
  • Max Heap size: 2Gb
  • JDK: OpenJDK 64-Bit Server VM Microsoft-18724
  • Physical memory: 16Gb
long sumNumbers(Set<Long> numbers) throws InterruptedException {
    for (int i = 0; i < 30_000_000; i++) {
        numbers.add(random.nextLong());
        if (i % 1_000_000 == 0) Thread.sleep(1000); // To have time to check jconsole
    }
    return numbers.stream().reduce(0L, Long::sum);
}
Enter fullscreen mode Exit fullscreen mode
public static void main(String[] args) throws InterruptedException {
    var start = Instant.now();
    new Main(). executeTest();
    var end = Instant.now();
    var timeMilli = end.toEpochMilli() - start.toEpochMilli();
    System.out.println("Time to get finished in ms: " + timeMilli);
}

Enter fullscreen mode Exit fullscreen mode

HashSet Java implementation

The first test will be using a simple HashSet implementation

void executeTest() throws InterruptedException {
    final Set<Long> set = new HashSet<>();
    System.out.println(sumNumbers(set));
}
Enter fullscreen mode Exit fullscreen mode

Alt Text

Alt Text

We can say It took about 58K ~ 59K milliseconds to get finished.

ChronicleSet implementation

ChronicleSet provides a builder that needs the type and the max amount of entries to allocate the memory based on them.

void executeTest() throws InterruptedException {
    final var set = ChronicleSetBuilder.of(Long.class)
        .entries(30_000_000)
        .create();
      return sumNumbers(set);
}
Enter fullscreen mode Exit fullscreen mode

Alt Text

Alt Text

We can say It took about 52K ~ 53K milliseconds to get finished and also notice about change in the heap memory used and the reduction of GC impact.

Conclusions

Off-heap memory is a good option when the data stored in the heap is huge and we need to reduce the time consumed by the garbage collector, also when an application uses more memory than the available physical and using disk space is an option. However, it is worthy to remind that working directly to memory allocation is not an easy task and it can bring difficult issues to deal with.

In what other cases do you think off-heap memory can be used ?

If you like this post you can find more in https://jeisson.dev/blog/ and follow me in twitter @jeissonflorez29 👋

Top comments (2)

Collapse
 
netikras profile image
Darius Juodokas

I agree that it's all pretty and smells nice, but all that is until you have to think about limits.

How much memory will you assign a container, if you are using off-heap?
How will you know which limits were breached and you got the JVM OOMKilled?
Did you spin up one too many threads? Did your off-heap memory store get one byte too large?

This poses a problem, because, while it's easy to estimate and monitor Heap, it's very difficult to monitor off-heap. In fact, if you go ahead and enable NMT (for monitoring or whatever), you will add 2-3 instructions to each malloc, and in the end, your off-heap implementation will be much slower, than the one, where you ask a GC to do the allocations.

The awesomeness of the Heap-full implementation of your (or any other) algorithm is that you can apply limits to it, AND you can monitor it. Meaning, you can
a) predict, when you're about to get an OOME
b) make an educated guess what was the reason for an OOME by analysing retrospective metrics

None of that applies to off-heap. Off-heap is like sailing uncharted waters: you can do anything you like in there, but bear in mind that no laws protect you. Here be dragons!

I strongly believe that playing with off-heap is a VERY dangerous matter and only should be considered when all the other options are exhausted.

Collapse
 
jeissonflorez29 profile image
Jeisson Florez

Hi Darius, thanks for reading and commenting.

It's true that managing memory out of the "standards" that the JVM provides, brings other concerns and problems to think about, however off-heap is not a replacement of the heap memory for all cases and should be something to evaluate carefully depending on the case.

As I mentioned in the post it could be a good try when we are dealing with a lot of data stored in memory, or even when that data exceeds the physical memory and using disk is an option.

Finally, there are many options for addressing design problems and I just wanted to show one that I found interesting IMHO.