JVM Architecture

#java #jvm #jre

Bytecode will be executed by JRE. JRE is the implementation of JVM, which analyses the bytecode, interprets the code and executes it.
JVM – Virtual machine is a software implementation of a physical machine. Java was developed with the concept of WORA, which runs on a VM. The compiler compiles a Java file into a Java .class file, then that .class file is input into the JVM, which loads and executes the class file.
Below is the Architecture of JVM:

JVM is divided into 3 main subsystems:
 ClassLoader Subsystem
 Runtime Data Area
 Execution Engine

ClassLoader Subsystem - Java's dynamic class loading functionality is handled by the ClassLoader subsystem. It loads, links. and initializes the class file when it refers to a class for the first time at runtime, not compile time. It verifies class files using a bytecode verifier. A class file will only be loaded if it is valid.
1.1 Loading – Classes will be loaded by this component. Following are the different class loaders that will help in achieving it and these follow Delegation Hierarchy Algorithm while loading the class files.
• Bootstrap ClassLoader - Responsible for loading classes from the bootstrap classpath, nothing but rt.jar. Highest priority will be given to this loader.
• Extension ClassLoader – Responsible for loading the classes which are inside the ext folder (jre\lib)
• Application ClassLoader - Responsible for loading Application Level Classpath, path mentioned Environment Variable, etc.
1.2 Linking – Following are the different components under Linking
• Verify – Bytecode verifier will verify whether the generated bytecode is proper or not, if verification fails, we will get the error.
• Prepare – For all static variables memory will be allocated and assigned with default values.
• Resolve - All symbolic memory references are replaced with the original references from Method Area.
1.3 Initialization – This is the final phase of ClassLoading; here, all static variables will be assigned with the original values, and the static block will be executed.
Runtime Data Areas – Runtime Data Area is divided into 5 major components:

2.1 Method Area – The method area is also called the class area. The method area stores data for each and every class, like fields, constant pools, and method data and information. All the class level data will be stored here, including the static variables. There is only one Method Area per JVM and it’s a shared resource.
2.2 Heap Area – All the objects and their corresponding instance variables and arrays will be stored here. There is only one Heap Area per JVM and it’s a shared resource. Since the Method and Heap Areas share memory for multiple threads, the data stored is not thread-safe.
2.3 Stack Area – For every thread, a separate runtime stack will be created. For every method call, one entry will be made in the stack memory which is called Stack Frame. All the local variables, all the parameters and all the return addresses will be created in the stack memory. Stack never store objects, but they store object references. The Stack Area is thread-safe since it’s not a shared resource. The Stack Frame is divided into three sub entities:
• Local Variable Array – Related to the method how many local variables are involved and the corresponding values will be stored here.
• Operand Stack - If any intermediate operation is required to perform, operand stack acts as runtime workspace to perform the operation.
• Frame Data - All symbols corresponding to the method are stored here. In the case of any exception, the catch block information will be maintained in the frame data.
2.4 Program Counter (PC) Registers - Each thread will have separate PC Registers, to hold the address of current executing instruction. Once the instruction is executed the PC register will be updated with the next instruction.
2.5 Native Method Stacks - Native Method Stack holds native method information. For every thread, a separate native method stack will be created. Native internal threads contain all the information related to native platforms. For example, if we're running the JVM on Windows, it will contain Windows-related information. Likewise, if we're running on Linux, it will have all the Linux-related information we need.

Execution Engine - The bytecode, which is assigned to the Runtime Data Area, will be executed by the Execution Engine. The Execution Engine reads the bytecode and executes it piece by piece. 3.1 Interpreter - The interpreter interprets the bytecode faster but executes slowly. The disadvantage of the interpreter is that when one method is called multiple times, every time a new interpretation is required. 3.2 JIT Compiler – The JIT Compiler compiles bytecode to machine code at runtime and improves the performance of Java applications. This way JIT compiler neutralizes the disadvantages of Interpreter. The execution engine will be using the help of Interpreter in converting the Bytecode, but when it finds repeated code, it uses the JIT compiler, which compiles the entire Bytecode and changes it to native code. This native code will be used directly for repeated method calls, which improve the performance of the System. Of course, JIT compilation does require processor time and memory usage. When the JVM first starts up, lots of methods are called. Compiling all of these methods might affect start up time significantly, though a program ultimately might achieve good performance. Methods are not compiled when they are called the first time. For each and every method, the JVM maintains a call count, which is incremented every time the method is called. The methods are interpreted by the JVM until the call count exceeds the JIT compilation threshold (the JIT compilation threshold improves performance and helps the JVM to start quickly. The threshold has been selected carefully by Java developers for optimal performance. The balance between start up times and long-term performance is maintained). Therefore, very frequently used methods are compiled as soon as the JVM has started, and less frequently used methods are compiled later. After a method is compiled, its call count is reset to zero, and subsequent calls to the method increment its call count. When the call count of a method reaches a JIT recompilation threshold, the JIT compiler compiles method a second time, applying more optimizations as compared to optimizations applied in the previous compilation. This process is repeated until the maximum optimization level is reached. The most frequently used methods are always optimized to maximize the performance benefits of using the JIT compiler.

Let’s say the JIT recompilation threshold = 2. After a method is compiled, its call count is reset to zero and subsequent calls to the method increment its call count. When the call count of a method reaches 2 (i.e. JIT recompilation threshold), the JIT compiler compiles the method a second time, applying more optimizations.

Following are the subcomponents involved as part of JIT Compiler:
• Intermediate Code Generator – Produces intermediate code
• Code Optimizer – Responsible for optimizing the intermediate code generated above
• Target Code Generator – Responsible for Generating Machine Code or Native Code
• Profiler – A special component, responsible for finding hotspots, i.e. whether the method is called multiple times or not.
3.3 Garbage Collector – Garbage collection is the process by which the JVM clears objects (unused objects) from the heap to reclaim heap space. Collects and removes unreferenced objects. Garbage Collection can be triggered by calling System.gc(), but the execution is not guaranteed. Garbage collection of the JVM collects the objects that are created.

Java Native Interface (JNI) - JNI will be interacting with the Native Method Libraries and provides the Native Libraries required for the Execution Engine.
Native Method Libraries - This is a collection of the Native Libraries, which is required for the Execution Engine.

The most important JVM Components related to performance are: Heap, JIT (Just In Time) Compiler and Garbage collector.