This post offers a brief introduction to how Java executes a code written in Java Language under the hood.
Here is the list that I am going to explore,
Java Compiler
Is Java a compiled-language or an interpreted-language?
Kinda like both! The reason lies within the compilation process of Java.
In many other languages, their compilers convert the source code into machine-specific code and then the machine will execute the instructions resides in that machine code.
But in Java, the Java Compiler does not convert Java source code into machine code (i.e. Binary) directly. Instead, it converts the source code into an intermediary code called bytecode. Then the Java Virtual Machine (JVM) will execute that bytecode by interpreting it to the machine code. But JVM uses a Just In Time (JIT) compiler to compile some of the code into native code (machine code). Therefore, Java is both compiled and interpreted language.
The javac
is a component of the Java Development Kit (JDK) which specifies the Java compiler.
The Java compiler transforms the source code located in .java
files into .class
files which are essentially the bytecodes of those Java Codes.
Not only just Java, basically any language can implement its compiler that parses the source code into valid bytecode, and then it can be executed using the JVM.
If you have multiple classes in a single
.java
file then, it will generate a.class
file for each class.
Java Virtual Machine
After javac
compiles the source code to bytecode, JVM executes it. This is called the program run phase.
The JVM is divided into three main subsystems.
- Classloading Subsystem
- Runtime Data Areas
- Execution Engine
Other than that it consists of Native Method Libraries which are platform-specific executable code (written in c/c++) contained in libraries or DLLs and a Java Native Interface (JNI) which the interface that Execution Engine use to interact with the Native Method Libraries.
Classloading Subsystem
Classloading Subsystem is used for loading, linking and the initialization of the .class
files generated by the javac
Loading
Java classes aren't loaded into memory all at once. They get loaded when they are required by an application (dynamic loading). Classes are loaded with the help of three class loaders.
Bootstrap Classloader - This loader is responsible for loading the core classes such as
java.lang.Object
,java.lang.Class
andjava.lang.Classloader
from bootstrap classpath which isrt.jar
. This Classloader is the parent of all the Classloaders.Extention Loader - This loader continues the loading process by loading the classes that are an extension of the standard core Java classes. These classes are available to all applications running on the platform (i.e. JRE).
Application Loader - The loading ends by loading the initial user-defined class which resides in the application level classpath, which mentioned in the Environment Variable.
Above classloaders will follow Delegation Hierarchy Algorithm while loading class
What is Delegation Hierarchy Algorithm?
When a Classloader is requested to load a class, the Classloader will delegate the request to the parent Classloader.
For example, if the JVM is requested to load a class, the Application Classloader will delegate it to the Extension Classloader. Then the Extension Classloader will delegate it to the Bootstrap Classloader. If the Bootstrap Classloader is unsuccessful in loading the class, then the Extension Classloader will try to load it. Only if the Extension Classloader fails to find the class, then the Application Classloader will try to load the class.
If the class is not found even after the Application Classloader tries to load it, then an error will be thrown.
Linking
Linking a class involves following operations,
Verification - Ensure the bytecode is structurally correct.
Preparation - Memory will be allocated for static variables and the default values will be assigned to them.
Resolution - Symbolic memory references will be replaced with the actual values.
Initialization
This is the final phase of the Classloading subsystem. Here, all static variables will be assigned with their original values and then the static block will get executed. As a result, the main()
method will get executed, therefore the other classes as well. It will cause the loading, linking, and initialization of those classes.
Runtime Data Area
The JVM creates multiple runtime data areas. Some of them are created and destroyed with the JVM and some get created when a new thread is created and destroyed when the respective thread ends.
There are five major data areas in the JVM.
Method Area
The simplest type of memory to manage. This is a shared resource. There is only one Method Area per JVM. It can consist of anything that can be completely determined at compile time such as static variables, constants(perhaps), code.
Heap Area
The least organized and most dynamic data area. This is a resource that is shared with all threads. The Heap is used to dynamically allocate and deallocate memory for class instances (objects) and arrays. Special operations such as new
are needed to allocate heap storage. The memory assigned for objects never explicitly deallocated and this space is reclaimed by the garbage collector(discussed later). The memory assigned for the Heap is not contiguous. Deallocation may leave "holes" in the heap (a.k.a fragmentation).
Stack Area
For every thread, a separate runtime stack will be created. Therefore data stored in the stack are thread-safe, unlike in Method Area and Heap Area. For every method call, one entry will be made in the stack called a Stack Frame. A Stack Frame is divided into three subentities.
Local Variable Array - stores local variables and their corresponding values.
Operand Stack - If any intermediate operation is required to perform, then this will act as a runtime workspace to operate.
Frame Data - All symbols corresponding to the method are stored here. The
catch
block information is also stored here.
PC Registers
Each thread will have separate PC Register to hold the address of the machine instruction which is currently executing.
Native Method Stacks
For each thread, a Native Method Stack will be created to hold the native method information provided by the Native Method Libraries.
Execution Engine
After the bytecode load into memory and the Runtime Data Areas are allocated, then the execution of the bytecode will be done by the Execution Engine. Execution Engine consists of three subsystems.
Interpreter
The interpreter interprets the bytecode faster but executes slowly. If one method is called multiple times, every time the interpreter will interpret it.
JIT Compiler
The Just In Time (JIT) compiler will identify the hotspots of the code which are the code that gets repeated and get interpreted repeatedly, and compile those code into native code (machine-specific code) which improves the performance. The JIT compiler consists of the following components.
Intermediate Code Generator - Produces intermediate code for optimization.
Code Optimize - optimize the intermediate code generated above. Such as elimination of common sub-expressions, translation from stack operations to register operations, reduction of memory accesses by register allocation, etc.
Target Code Generator - Generate Machine Code (Native Code)
Profiler - Finds hotspots in the bytecode.
Garbage Collector
Collects and removes unreferenced objects(inaccessible objects / orphans). Garbage Collection can also be triggered manually by calling System.gc()
.
Thanks for reading.
See you in the next post!
Top comments (4)
I'd like to add that
System.gc()
call just requests garbage collection. The actual GC may occur before that call, immediately, some time later or may not occur at all.Thanks a lot for your input and your comment 😇
Nicely put together summary! ❤️ Can use this as a blueprint for better understanding JVM.
Thank you and glad to hear that ❤️