Sean Policarpio

Posted on Dec 15, 2023

Un-JAR-ing your Java apps for Docker

#java #maven #docker

After a small detour at work, I've found myself returning to the world of Java and its virtual machine.

Recently, I was tasked with preparing our backend application in a Docker image for deployment. Having done something similar with a Scala backend before, I was confident I could replicate a strategy I had learned, but with Java build tools instead. Succinctly, the strategy was to create an incremental and layered image by not using a Java JAR, but instead copying and directly executing compiled Java bytecode. After further expanding on the motivation to not use a JAR, I will demonstrate how I did this in Maven using standard plugins.

Why not use a JAR?

Having the ENTRYPOINT or CMD of your Dockerfile execute something like below does work:

java -jar your_app.jar

So why not stick with that approach? First off all, a JAR within a Docker image is redundant: both are basically archive formats. So quite simply, it's really unnecessary to have your code in a JAR if the unit of execution is the Docker image itself.

But there are more important benefits that become apparent when you start running your app in production.

Easier image inspection and debugging

With every code change you and your team make, those changes are likely reflected in your version control system. But those same changes are opaquely hidden if the JAR that bundles it all up is copied into a Docker image. I've found it extremely useful to be able to inspect the code within Docker images without the need to extract JARs. This is especially true when things go awry in production and you need to do this in your container environment (e.g. AWS, Google Cloud Platform). This could be anything from inspecting class code using javap or—something I've done several times in the past—viewing bundled resources like .yaml, .properties, or .hocon config files to confirm settings or flags. If a containers state is frequently changing, it's especially helpful when you don't need to re-extract the JAR archive everytime the container is restarted.

If you happen to bundle source code in your JARs, it is also useful to have them unbundled in your container when debugging. Ideally, you would have something like a git SHA for reference when comparing with version control (e.g. by tagging the Docker image with the SHA). But sometimes it's still helpful to dig into the image to sanity check what is exactly in there.

Diffs

Intuitively, it is also beneficial if everytime the image was built, you only copied the modified code and resources. This would allow you to reduce the Docker layer size between builds—likely reducing it into the magnitude of bytes or kilobytes. Unfortunately, Docker has had an age-old bug which doesn't allow you to optimize your subsequent image builds by detecting and copying only those files that have changed (e.g. like rsync). If this is ever improved, this would be another benefit to unbundling JARs: smaller Docker images, faster repository pushes/pulls, and less remote storage required.

In any case, being able to tell what's different between two builds is valuable and easier without a JAR. You can use diff to compare your application directory in two images, but you'd need to do some cp'ing to get everything localised first. Fortunately, a tool like container-diff makes this much easier. The following shows an example of performing a diff to show the only file modified and compiled in a revision to my Docker image:

% ./container-diff-darwin-amd64 diff daemon://java-backend-api:1 daemon://java-backend-api:2 --type=file     

-----File-----

These entries have been added to java-backend-api:1: None

These entries have been deleted from java-backend-api:1: None

These entries have been changed between java-backend-api:1 and java-backend-api:2:
FILE                                             SIZE1        SIZE2
/app/io/policarp/service/DemoServer.class        2K           2.2K

Additionally, unless you are copying an "uber" or "shaded" JAR, it's likely you must copy all individual dependencies into your Docker image too. The approach I'll demonstrate shows how you can also gain the same insight into third-party libraries by not having them also bundled as JARs.

Prepping your pom.xml

In my last Scala project, I used the sbt-assembly plugin to prep my code for copying and execution in Docker. Assuming you have a single project pom, the following excerpt shows the two plugins I used to repeat the same outcome in Maven.

<plugins>
    ...
    <plugin>
        <artifactId>maven-assembly-plugin</artifactId>
        <version>3.6.0</version>
        <configuration>
            <descriptorRefs>
                <descriptorRef>jar-with-dependencies</descriptorRef>
            </descriptorRefs>
        </configuration>
    </plugin>

    <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>exec-maven-plugin</artifactId>
        <version>3.1.1</version>
        <executions>
            <execution>
                <!-- Provides an execution to unzip the packaged JAR for use in a Docker image -->
                <id>unzip-assembly-for-docker</id>
                <configuration>
                    <executable>unzip</executable>
                    <arguments>
                        <argument>-quo</argument> <!-- quiet, update, overwrite -->
                        <argument>target/${project.build.finalName}-jar-with-dependencies.jar</argument>
                        <argument>-d</argument>
                        <argument>target/docker-image-target</argument>
                    </arguments>
                </configuration>
            </execution>
        </executions>
    </plugin>
    ...
</plugins>

maven-assembly-plugin is first used to create a JAR bundle of the application code and all the dependencies using the pre-configured descriptor jar-with-dependencies.
exec-maven-plugin is then used to unzip the assembly JAR for copying into Docker.

exec-maven-plugin is used just so we can keep everything inside of Maven. This is not only helpful for developer ergonomics, but also for keeping CI/CD pipelines simple. The caveat is that the environment where Maven is running depends on the command-line tool unzip, which is fortunately normally available.

With the above configured, you can simply run the following to prepare your application for Docker:

mvn package assembly:single exec:exec@unzip-assembly-for-docker

package will JAR your compiled app,
assembly:single will assemble your package JAR with the dependency JARs, and finally,
exec will call the unzip command to extract everything into the target/docker-image-target directory, ready for Docker.

As demonstrated earlier with my compiled DemoServer Java class, the maven-assembly-plugin will similarly copy all third-party dependency libraries as extracted JARs and give you the opportunity to perform diffs on library updates between Docker image builds. This can prove useful when trying to determine what exactly has changed between builds, even if you already have library updates recorded in version control. And if Docker ever optimises copying only changed files, we'd get an improvement with builds with respect to library updates.

Creating the Docker image

With all your compiled code in a target directory, your Dockerfile can be as simple as the following:

FROM eclipse-temurin:21.0.1_12-jre-alpine

WORKDIR /app

# add and run-as unprivileged user
RUN addgroup --system app-user-docker
RUN adduser --system --disabled-password --no-create-home app-user-docker app-user-docker
USER app-user-docker

# copy the extracted JAR code on to the image
COPY target/docker-image-target /app

EXPOSE 8080
ENTRYPOINT ["java", "io.policarp.service.DemoServer"]

Next time you're tasked with building a Java app in Docker, consider the approach I've demonstrated here. It's worth noting that if there is any reason you don't like it and/or find issues with unbundled JARs, going back (and forth) is not burdensome. Finally, in case you are wondering, this works just fine with Spring Boot.

DEV Community