How to Read Java Bytecode (with examples)

#java #jvm #tutorial

Cover image courtesy of divinetechygirl

When working with the JVM ecosystem, it's important to spend some time and understand what is happening behind the scenes. Even at the basic level, we should be able to understand and explain with simple words what is the JVM, how the compilation works, what is Bytecode, and how to read it.

In this tutorial, we are going to see a 10000 foot view of the JVM, understand some basic concepts and learn how to read Bytecode from a simple program.

Let's start.

What is the JVM?

The JVM in simple words is an engine that reads compiled code in a format that is specified from a Java Virtual Machine Specification and executes it on the current machine. The advantages of this approach is mainly cross-platform compatibility, as the compiled code, which is called bytecode, is supposed to be platform agnostic.

That means that the code compiled in a Linux machine and the code compiled in a Windows machine should work in the JVM either ways. We can copy the compiled .class files from linux to windows and run them there without issues and vice versa.

In other words, when you install Java on your Windows PC, the java tool will use a platform specific runtime and a JIT compiler to run your code on Windows. The javac on the other-hand will compile your .java files to the generic bytecode format.

The Bytecode itself is a format that follows a specification from the Java Virtual Machine Specification. It has various features enabled based on the current version. Those features are dictated based on JSR's or Java Specification Requests and based on the current implementation. Here is the list for the OpenJDK 9 for example:



102: Process API Updates
110: HTTP 2 Client
143: Improve Contended Locking
158: Unified JVM Logging
165: Compiler Control
193: Variable Handles
197: Segmented Code Cache
199: Smart Java Compilation, Phase Two
200: The Modular JDK
201: Modular Source Code
211: Elide Deprecation Warnings on Import Statements
212: Resolve Lint and Doclint Warnings
213: Milling Project Coin
214: Remove GC Combinations Deprecated in JDK 8
215: Tiered Attribution for javac
216: Process Import Statements Correctly
217: Annotations Pipeline 2.0
219: Datagram Transport Layer Security (DTLS)
220: Modular Run-Time Images
221: Simplified Doclet API
222: jshell: The Java Shell (Read-Eval-Print Loop)
223: New Version-String Scheme
224: HTML5 Javadoc
225: Javadoc Search
226: UTF-8 Property Files
227: Unicode 7.0
228: Add More Diagnostic Commands
229: Create PKCS12 Keystores by Default
231: Remove Launch-Time JRE Version Selection
232: Improve Secure Application Performance
233: Generate Run-Time Compiler Tests Automatically
235: Test Class-File Attributes Generated by javac
236: Parser API for Nashorn
237: Linux/AArch64 Port
238: Multi-Release JAR Files
240: Remove the JVM TI hprof Agent
241: Remove the jhat Tool
243: Java-Level JVM Compiler Interface
244: TLS Application-Layer Protocol Negotiation Extension
245: Validate JVM Command-Line Flag Arguments
246: Leverage CPU Instructions for GHASH and RSA
247: Compile for Older Platform Versions
248: Make G1 the Default Garbage Collector
249: OCSP Stapling for TLS
250: Store Interned Strings in CDS Archives
251: Multi-Resolution Images
252: Use CLDR Locale Data by Default
253: Prepare JavaFX UI Controls & CSS APIs for Modularization
254: Compact Strings
255: Merge Selected Xerces 2.11.0 Updates into JAXP
256: BeanInfo Annotations
257: Update JavaFX/Media to Newer Version of GStreamer
258: HarfBuzz Font-Layout Engine
259: Stack-Walking API
260: Encapsulate Most Internal APIs
261: Module System
262: TIFF Image I/O
263: HiDPI Graphics on Windows and Linux
264: Platform Logging API and Service
265: Marlin Graphics Renderer
266: More Concurrency Updates
267: Unicode 8.0
268: XML Catalogs
269: Convenience Factory Methods for Collections
270: Reserved Stack Areas for Critical Sections
271: Unified GC Logging
272: Platform-Specific Desktop Features
273: DRBG-Based SecureRandom Implementations
274: Enhanced Method Handles
275: Modular Java Application Packaging
276: Dynamic Linking of Language-Defined Object Models
277: Enhanced Deprecation
278: Additional Tests for Humongous Objects in G1
279: Improve Test-Failure Troubleshooting
280: Indify String Concatenation
281: HotSpot C++ Unit-Test Framework
282: jlink: The Java Linker
283: Enable GTK 3 on Linux
284: New HotSpot Build System
285: Spin-Wait Hints
287: SHA-3 Hash Algorithms
288: Disable SHA-1 Certificates
289: Deprecate the Applet API
290: Filter Incoming Serialization Data
291: Deprecate the Concurrent Mark Sweep (CMS) Garbage Collector
292: Implement Selected ECMAScript 6 Features in Nashorn
294: Linux/s390x Port
295: Ahead-of-Time Compilation
297: Unified arm32/arm64 Port
298: Remove Demos and Samples
299: Reorganize Documentation

How to read Bytecode?

Let's go a little bit more practical and try to understand and read bytecode from a simple program.

For the purposes of this tutorial I will be using IntelliJ IDEA with the ASM Bytecode Outline Plugin but you can also use VScode with the javap as well.

Create a new Java program using the editor dialog.
Create a new java file called DetermineOS.java with the following code:



public class DetermineOS {

    public static void main(String[] args) {

        String strOSName = System.getProperty("os.name");

        System.out.print("Display the current OS name example.. OS is ");
        if(strOSName != null)
        {
            if(strOSName.toLowerCase().contains("linux"))
                System.out.println("Linux");
            else
                System.out.print("not Linux");
        }
    }
}

The above code just retrieves the os.name system property and checks if it contains the string linux. Then, depending on this cases, it prints some strings.

Click on View -> Show Bytecode with Jclasslib

In the panel that appears on the right hand side we can see some information about the Bytecode. Let's go though one by one:

General Information: This section shows specific info about the JVM version that compiled this class, the number of constants in the constant pool, access flags for this class, and some other counters:

Constant Pool: The JVM loads a hash-map of constants for each type that it sees in the classpath. This map basically consists of literal values, strings and other references to types or fields. All the values are referenced by a unique key. In our example we can see the string for os.name is loaded with key #33:

Interfaces+Fields: This section displays any interface and field declarations. Since we didn't specified any, the section is empty.
Methods: This section will contain the juice of the bytecode. In our program we have currently define one main method. When javac compiles this code it will create two entries, one for the method and one for the main method. Let's explore more about those:
- : The bytecode that is emitted for this method is the following:



0 aload_0
1 invokespecial #1 <java/lang/Object.<init>>
4 return

In JVM, every constructor of a class, even if it's not defined, is invoked as a call to <init> and it is supplied by the compiler. The above instructions are the minimum required to call <init>. aload_0 instructs the runtime to load the local reference at index 0 of the current frame. This contains a reference to <java/lang/Object.<init>> itself so the next instruction invokespecial #1 is a special instance call that calls referenced as #1. Then we return.

main: The bytecode that is emitted for this method is the following:



0 ldc #5 <os.name>
 2 invokestatic #6 <java/lang/System.getProperty>
 5 astore_1
 6 getstatic #2 <java/lang/System.out>
 9 ldc #7 <Display the current OS name example.. OS is >
11 invokevirtual #4 <java/io/PrintStream.print>
14 aload_1
15 ifnull 49 (+34)
18 aload_1
19 invokevirtual #8 <java/lang/String.toLowerCase>
22 ldc #9 <linux>
24 invokevirtual #10 <java/lang/String.contains>
27 ifeq 41 (+14)
30 getstatic #2 <java/lang/System.out>
33 ldc #11 <Linux>
35 invokevirtual #12 <java/io/PrintStream.println>
38 goto 49 (+11)
41 getstatic #2 <java/lang/System.out>
44 ldc #13 <not Linux>
46 invokevirtual #4 <java/io/PrintStream.print>
49 return

This is more complicated but if you look at it more carefully, you can kind of map it with the code itself. The Bytecode panel helps us here by providing links for each instruction. Here is some explanations for a few of them:



0 ldc #5

Pushes item #5 from the constant pool which is "os.name" into the stack



2 invokestatic #6

Calls static method referenced at #6 which is <java/lang/System.getProperty>. This will consume the top stack frame variable that we passed before.



5 astore_1

Stores reference from the top stack position into local frame at index 1. This is basically the result of the previous call to invokestatic. So we have completed the following operation:

String strOSName = System.getProperty("os.name");



15 ifnull 49

Conditional branch which means. If the value from the current stack top taken by the previous instruction aload_1 which is the one we used to store the result of astore_1 is null, then goto line 49, otherwise continue into the next line. Line 49 in our Bytecode is:



49 return

You can continue reading the bytecode and understand what invokevirtual, getstatic or ifeq mean.

Attributes: Information about the actual source file is displayed here. For example we can see the name of the file that points to the Constant Pool map:

Note: If you want to see the code using the javap tool you can issue the following command:



$ javap -c out.production.jvm-experiments.DetermineOS

I've named my project jvm-experiments so you need to put the right path using your project name.

More information

That's it for now. I hope you liked what we did today, and understood the basic blocks of JVM and Bytecodes. If you want to learn more about the JVM ecosystem I can recommend the following resources:

JVM Specification: This is the defacto reference for everything it has to do with the JVM platform.
Jprofiler: Profiling your Java applications like a boss.
Tutorials Point: Quick and easy tutorials