A B Vijay Kumar

Posted on Jan 16, 2022

Building GraalVM Native Image of a Polyglot Java+numpy application

One of the greatest features of GraalVM is to provide a universal runtime for running code written in different languages. This opens up a huge opportunity to reuse the existing tested and hardened code, without rewriting it in target languages. This is very handy at times where code is tough to migrate, or the host language has features that make it an obvious choice to implement. For example, Python and R are known for the right libraries, and the simplicity they provide in building data science and machine learning applications.

Before I get into the actual topic, let me introduce GraalVM. Here are some blogs on GraalVM, I had published before.

Episode 1: “The Evolution” — Java JIT Hotspot & C2 compilers (the current episode…scroll down)
*Episode 2: “The Holy Grail” — GraalVM*
*Java Serverless on Steroids with fn+GraalVM Hands-On*
This blogs provides a hands on example of how to build a serverless application using fn project and run it on GraalVM

I also blogged about what is inside the book here. Supercharge Your Applications with GraalVM — Book

You can check out the book at these links

In this blog, we will explore how we can use the GraalVM Polyglot library to call a python program, that uses numpy, from a Java application.

This python program performs a simple data analysis of a dataset that I picked from Kaggle (https://www.kaggle.com/rashikrahmanpritom/heart-attack-analysis-prediction-dataset).

This dataset has records of people who had a heart attack and various important information about these patients, which will help us do some data analysis to identify patterns.

Before we get hands-on, we need to understand the Polyglot architecture of GraalVM. GraalVM comes with a framework called “Truffle”, which allows polyglot interoperability on GraalVM.

GraalVM Polyglot Architecture — Truffle

Truffle is an open-source library that provides a framework to implement language interpreters. Truffle helps run guest programming languages that implement the framework to utilize the Graal compiler features to generate high-performance code. Truffle also provides a tools framework that helps integrate and utilize some of the modern diagnostic, debugging, and analysis tools.

Let’s understand how Truffle fits into the overall GraalVM ecosystem. Along with interoperability between the languages, Truffle also provides embeddability. Interoperability allows the calling of code between different languages, while embeddability allows the embedding of code written in different languages in the same program.

Language interoperability is critical for the following reasons:

Different programming languages are built to solve different problems, and they come with their own strengths. For example, we use Python and R extensively for machine learning and data analytics, and we use C/C+ for high-performance mathematical operations. Imagine if we would reuse the code as it is, either by calling the code from a host language (such as Java) or embedding that code within the host language. This also increases the reusability of the code and allows us to use an appropriate language for the task at hand, rather than rewriting the logic in different languages.
Large migration projects where we are moving from one language to another can be phased out if we have the feature of multiple programming language interoperability. This brings down the risk of migration considerably.

The following figure illustrates how to run applications written in other languages on GraalVM:

In the figure, we can see GraalVM, which is the JVM and Graal JIT compiler that we covered in the previous chapters. On top of that, we have the Truffle framework. Truffle has two major components. They are as follows:

Truffle API: The Truffle API is the language implementation framework that any guest language programmers can use to implement the Truffle interpreter for their respective languages. Truffle provides a sophisticated API for Abstract Syntax Tree **(AST**) rewriting. The guest language is converted to AST for optimizing and running on GraalVM. The Truffle API also helps in providing an interoperability framework between languages that implement the Truffle API.
Truffle optimizer: The Truffle optimizer provides an additional layer of optimization for speculative optimization with partial evaluation. We will be going through this in more detail in the subsequent sections.

Above the Truffle layer, we have the guest language. This is JavaScript, R, Ruby, and others that implement the Truffle Language Implementation framework. Finally, we have the application that runs on top of the guest language runtime. In most cases, application developers don’t have to worry about changing the code to run on GraalVM. Truffle makes it seamless by providing a layer in between.

Truffle provides the API that the individual interpreters implement to rewrite the code into ASTs. The AST representation is later converted to a Graal intermediate representation for Graal to execute and also optimize just in time. The guest languages run on top of the Truffle interpreter implementations of the respective guest languages. To read and understand mode about how Truffle works, Please refer to my book

Hands-On — Java & Python Interoperability

The dataset has various columns, the key columns are age, sex, chest pain, cholesterol levels, etc. The following is the screenshot of the dataset

Lets now build a simple numpy module (in python) that calculates the average age of people who have level 3 chest pain, and people with level 3 chest pain. And then we will be calling this python method from Java.

Step 1: Setup Environment

Let's first start with installing GraalVM. You can refer to GraalVM Documentation on installing it on your target OS. I always prefer using VisualStudio Code, as it provides a great integrated environment and helps manage the environment and different versions of GraalVM with ease.

You can install GraalVM on Visual Studio Code as an extension. Find below the screenshot of where you can find it on Visual Studio Code

You can install either Community or Enterprise (or both, as VSCode provides a way to have multiple environments, and provides a very easy way to switch between the environments). In my case, I am installing Community edition.

Once I install it, VSCode also helps to set the respective environment variable. This ensures that the integrated terminal points to the right version of the GraalVM. (This can be easily switched with other versions)

Once the environment is set, we also need to install the other optional runtimes. We will need Python, LLVM, Native Image runtimes. You should be able to install them by clicking the “+” button

Once all the optional runtime is installed, you can check the versions in the VSCode integrated terminal.

Let's now create a virtual environment. Instead of using, we will be using graalpython. The following is the command to create a virtual environment with graalpython

graalpython -m venv ab_venv

To activate the virtual environment we execute source ab_venv/bin/activate

Let's set the python environment variable

export GRAAL_PYTHONHOME=$GRAALVM_HOME/languages/python

Lets now install numpy. Once again we will be using graalpython command line to install the packages. The following is the command for installing numpy

graalpython -m ginstall install numpy

Step 2: Build and test Python application

The following is the python code. It's a very simple numpy API call to calculate the averages, and return the values dataOfPeopleWith3ChestPain, averageAgeofPeopleWith3ChestPain

you can find the latest code in my GitHub account here

As you can see it is a very simple python application. We are loading the CSV file (dataset that we downloaded from Kaggle), we are then performing a simple statistical calculation and returning the average age of the people who had level 3 chest pain before a heart attack.

To check if our application is running, we will use graalpython

graalpython heartAnalysis.py

you should be able to see the output from the application. Here is what I can see.

Now that we know our python application is running and we have exposed the heartAnalysis() method, let's build a Java application to call this method.

Step 3: Build and Test Java Application

Find below the Java Application that calls the Python method, that we developed in Step 2.

Lets understand this Java code

Lines 1–4: We are importing the following Java libraries

java.io.File: as we will be loading the python source code file into the application
org.graalvm.polyglot.Context: GraalVM provides a polyglot context, that helps with the interoperability between code written in different langauges. Please refer to API doc here.
org.graalvm.polyglot.Source: This class represents the source code and the contents of this. We will be using this object to access the Python methods. Please refer to the API doc here.
org.graalvm.polyglot.Value: This class represents the value that can be passed between the host and guest languages. In this case, the host is Java and the guest is Python. Please refer to the API doc here.

Line 8: We are building and initializing the polyglot context object, and setting the permission to have complete access. This object will help us load and run the python code

Line 10–11: We are loading the python source code into the context and building the code.

Line 13: We are accessing the method definition using the Binding object. In our case, we are getting the reference to heartAnalysis() python method.

Line 15–18: We are invoking the method and printing the results.

Lets us now compile the Java code and run it

javac HeartAnalysisJava.java

java HeartAnalysisJava

Here is the screenshot of my terminal, after compiling and running the Java program.

You will see 3 outputs. The first one is coming from print(averageAgeofPeopleWith3ChestPain) a statement that is called within the python code heartAnalysis(). The second output from the Java code invoking the heartAnalysis() python method and the last output is the data that Java code received from python code that we are printing with System.out.println in Line #16.

Now we have a working Java application that is invoking a python method. Let's now build a native image of this java code.

Step 4: Building Native Image

Ensure that the Native image runtime is installed, if not, you can install it using the VSCode GraalVM plugin, as shown in the screenshot below. Or you can install using GraalVM Updater utility. Please refer to the documentation here

To build the native image, let's execute the following command

native-image -language:python -Dorg.graalvm.launcher.relative.python.home=$GRAALVM_HOME/languages/python -Dorg.graalvm.launcher.relative.llvm.home=$GRAALVM_HOME/languages/llvm HeartAnalysisJava

GraalVM native-image command line is used to build the native image.

The -languageargument lets the native image know that we will be calling python code, and ensures that Python is available as a language for the image. The other 2 arguments are letting the native-image builder know where to find python runtime and the llvm runtime.

This generates a binary file heartanalysisjava, and we can run the application directly by executing the following command

./heartanalysisjava

The following is the screenshot of the build.

That's it for now, I hope you had fun playing around with GrallVM polyglot and building native images. I have gone into a great level of detail on how GraalVM works in my book Supercharge Your Applications with GraalVM — Book.

Hope this was helpful, Keep safe, Have fun, until next time :-D

References

My GitHub Repository — https://github.com/abvijaykumar/graalvm-numpy-polyglot
Installing GraalVM — https://www.graalvm.org/docs/getting-started/linux/
GraalVM — https://www.graalvm.org/

DEV Community

Building GraalVM Native Image of a Polyglot Java+numpy application