DEV Community


Understanding bytecode interpreters

Alexandre Plateau
C++ fanatic, video game maker, working on my own scripting language
・6 min read

Have you ever wondered how Java or Python work? What is a "virtual machine"? If those questions stir your interest, you've come to the right place!

If everything goes according to the plan, this will be part of a serie of articles on understanding and making bytecode interpreters (we will even see a bit of how you could compile a language to a bytecode).

Why should I care about VMs?

A virtual machine enables high portability: compile once, run everywhere.

You can get performances nearly equal to native programs ones, compared to tree-walk interpreters which have to hop from a node to another, oftentimes not contiguous ones, leading to really poor performances.

Also, it's a lot of fun to make one, because we will be creating our own instructions set (which can be seen as an assembly language) and creating our own "virtual CPU" to run our instructions set. In a way, making a VM is the same as making a small specialized virtual computer!

Basic concepts

A virtual machine (or bytecode interpreter) works on a linear instruction's flow. A "program" being linear gives us way better performances compared to non-linear one, because we don't have to fetch data 5KiB after the current one, then 2MiB before... the CPU cache will love it, as it will be more predictible.

Also, with such type of programs, we have only two things we can do:

  • execute current instruction, jump to the next one ;
  • jump to an arbitrary position in the bytecode (useful for loops and conditions)


In an high-level language, an instruction would be something like if (condition) then .... Way too verbose for a bytecode, we want to keep it small. In a bytecode, we would have something like PUSH 5; PUSH 4; EQ; POP_JUMP_IF_TRUE 15 (compare 4 and 5, jump to address 15 if the comparison was truthy), with each uppercase letters word being an instruction.

Nota bene: instructions take a fixed number of arguments, so that when we read an instruction like PUSH we know how much we should continue to read from the program before moving onto the next instruction.

To make that even smaller, we could write P5 P4 E PJIT15. We are still wasting space, and fetching an arbitrary length instruction (how do we know we need to read a single char like P or PJIT?) isn't efficient. We could write bytes, with a byte per instruction.

Instruction name Value
PUSH 0x01
EQ 0x02

Our code will look like this now: \x01\x00\x05\x01\x00\x04\x02\x03\x00\x0f! (Assuming we write numbers on a fixed number of bytes, here 2)

Let's break this down:

\x01 \x00\x05   ->   PUSH 0x0005
\x01 \x00\x04   ->   PUSH 0x0004
\x02            ->   EQ
\x03 \x00\x0f   ->   POP_JUMP_IF_TRUE 0x000f
Enter fullscreen mode Exit fullscreen mode

Now you may wonder how EQ works, and why we have PUSH and POP_... instructions. For this short example I assumed we were using a stack based virtual machine. Basically, we have a LIFO stack (last in, first out) on which we put values, and instructions can use this stack to retrieve values (pop them off), do stuff with them, and push back (or not) a result to the stack.

The stack

As specified before, it is a special finite space on which we put our values to operate on them later, for example:

  • adding two numbers
  • calling a user-defined function
  • calling a builtin function
  • ...

We can use the stack to make temporary values, for example in a = 4 + 5 * 12, 5 * 12 would be a temporary, same goes for the 4 + 5 * 12 until it is popped and stored in the variable a.

To operate on this stack with more freedom, we can introduce special instructions such as DUP to duplicate the top of the stack value, and ROT to rotate the two values on top of the stack.

Example of a stack



Every CPU has a finite set of registers, and since our VM is inspired from the architecture of computers, could we (or should we) have registers in our implementation? What would be the benefits?

A stack based VM will need to pop and push very often to its stack in the context of recursive functions, while a register based one can be more easily optimized using a JIT (just in time) compiler, because our computers are register based, making the transformation quite "straightforward".

Generating code for a stack based VM is easier than for a register based one, because the stack is very large and we don't need to keep track of which register is occupied. Also, generated bytecode for stack based VM are smaller than register based ones because our instructions are often on a single byte, we don't need to add information about the registers used per instruction. However, even with larger instructions (leading to longer load time per instruction), register based VM are often faster than stack based ones (even without JIT compilers).

For a more in-depth analysis, go check this stackoverflow thread.

In this serie, we will only target stack based VM because they are easier to use, and while we aim for performance, top performance / near native performances aren't worth a shot here. Near native performances would take us years to achieve, and a lot of micro benchmarks and micro optimizations (that's still doable of course, but not our goal here).

From language to bytecode

Let's take this small snippet from an imaginary language:

let a = 4 + 5 * 12
let foo = (a, b, c) {
    let d = a + b
    print(d - c, c)
    if (d > 0)
        return 2 * a + b * b - 4 * a * c
        return 0
foo(a, 2 * a, 3 * a)
Enter fullscreen mode Exit fullscreen mode

If we break this down, we have:

  • arithmetic operations (+, -, *)
  • variable declarations
  • function declarations, with arguments
  • conditions
  • return operations
  • builtins calls (print)
  • user's functions calls

In term of instructions, we could go with this:

Name Job Value
ADD Add 2 values from the stack and push the result 0x01
SUB Substract 2 values from the stack and push the result 0x02
MUL Multiply 2 values from the stack and push the result 0x03
STORE <name> Store the top of the stack in the given name 0x04
LOAD <name> Load the variable name from the current scope and push it on the stack 0x05
GT Push the result of TS > TS1 on the stack (the 2 values are popped) 0x06
RET Quit the current function with the last value on top of the stack 0x07
POP_JUMP_IF_TRUE <dest> Pop the top of the stack, if it's truthy jump to the absolute bytecode address dest, otherwise do nothing 0x08
JUMP <dest> Jump to the given absolute bytecode address 0x09
CALL <argc> Call TS with argc arguments from the stack 0x0a

A lot is going on here:

  • we have STORE and LOAD both taking a variable name, implying the existence of a scoping system to keep track of variables. It also implies that we have a way to put a variable name after a fixed length instruction
  • we have RET to quit the context of a function and go back to previous one, implying that we have a way to keep track of function calls context/history (same goes for CALL)

To solve the problem of fixed length variable names we will introduce a symbol table at the top of our bytecode, keeping track of all the variable names declared in a program. This way, we can reference every variables by a fixed length number! Our symbol table for this code could look like this:

Symbol ID
a 0
foo 1
b 2
c 3
d 4
print 5

A small problem arise with print: it's a builtin symbol, thus having it at a random symbol id isn't practical. A solution could be to have it with always the same identifier, and all the other user-defined symbols would start after. Now our symbol table will be like that:

Symbol ID
a 1
foo 2
b 3
c 4
d 5

Notice that print isn't listed in it anymore! Our VM will have to know that every symbol identifier greater or equal to 1 is user-defined, and the ones before are builtins' ones.

As for the context tracking for functions' calls, we will see that in the next article. ;)

Discussion (0)

Forem Open with the Forem app