DEV Community

Cover image for Create your own bytecode Vm from scratch
SaptakBhoumik
SaptakBhoumik

Posted on

Create your own bytecode Vm from scratch

Hello everyone! In this tutorial I want to show you how to create your own bytecode vm. In this tutorial we will be making a register vm. A register vm is a vm where the results are stored in a particular register.

What is a bytecode vm?

A bytecode vm is a program that executes binary code. It is similar to what is done by your computer when you run a program but unlike native executable it is machine independent. It is used by many language because it is faster than walking the ast to show results and simpler compared to native machine code generation.

Let's start

  • Include the necessary header files
#include <iostream>
#include <vector>
Enter fullscreen mode Exit fullscreen mode
  • Define the registers where the results will be saved.
enum Reg{
    r1,
    r2,
    r3,
    r4,
    r5,
    r6,
};
Enter fullscreen mode Exit fullscreen mode

You can add as many as you want

  • Define the opcodes our vm supports. Opcodes are instruction that tell what to do
enum Opcode{
    OP_PRINT,
    OP_LOAD,
    OP_MOV
};
Enter fullscreen mode Exit fullscreen mode

The vm we are making is really simple so it has only 3 opcodes. You should try to add your own

  • Actual VM code
class VM{
    private:
        std::vector<int> m_code;//stores the code
        int m_memory[6];//stores the data
    public:
        VM(std::vector<int> code){
            m_code = code;
        }
        void run(){
            size_t pc = 0;
            while(pc < m_code.size()){
                switch(m_code[pc]){
                    case OP_PRINT:{
                        //OP_PRINT <reg>
                        //print the value of the register
                        auto reg=m_code[pc+1];//reg is the register that has the required data
                        auto data=m_memory[reg];//data is the value of the register
                        std::cout << data << std::endl;
                        pc += 2;
                        break;
                    }
                    case OP_LOAD:{
                        //OP_LOAD <reg> <value>
                        //load the value into the register
                        auto reg=m_code[pc+1];//reg is the register where the data will be stored
                        auto data=m_code[pc+2];//data is the value that will be stored
                        m_memory[reg] = data;
                        pc += 3;
                        break;
                    }
                    case OP_MOV:{
                        //OP_MOV <reg1> <reg2>
                        //move the value of the register2 into the register1
                        auto reg1=m_code[pc+1];//reg1 is the register where the data will be stored
                        auto reg2=m_code[pc+2];//reg2 is the register that has the required data
                        m_memory[reg1] = m_memory[reg2];
                        pc += 3;
                        break;
                    }
                    default:
                        std::cout << "Unknown opcode: " << m_code[pc] << std::endl;
                        return;
                }
            }
        }
};
Enter fullscreen mode Exit fullscreen mode

NOTE:- This VM is meant for learning so it is not very fast or practical

  • Driver code

int main(){
    std::vector<int> code = {
        OP_LOAD , r1 , 1 ,//$r1 = 1
        OP_LOAD , r2 , 2 ,//$r2 = 2
        OP_MOV , r3 , r1 ,//$r3 = $r1
        OP_PRINT , r3 ,//print $r3
        OP_PRINT , r2 ,//print $r2
        OP_PRINT , r1 ,//print $r1
    };
    auto vm=VM(code);
    vm.run();
}
Enter fullscreen mode Exit fullscreen mode

Full code

#include <iostream>
#include <vector>

enum Reg{
    r1,
    r2,
    r3,
    r4,
    r5,
    r6,
};

enum Opcode{
    OP_PRINT,
    OP_LOAD,
    OP_MOV
};

class VM{
    private:
        std::vector<int> m_code;//stores the code
        int m_memory[6];//stores the data
    public:
        VM(std::vector<int> code){
            m_code = code;
        }
        void run(){
            size_t pc = 0;
            while(pc < m_code.size()){
                switch(m_code[pc]){
                    case OP_PRINT:{
                        //OP_PRINT <reg>
                        //print the value of the register
                        auto reg=m_code[pc+1];//reg is the register that has the required data
                        auto data=m_memory[reg];//data is the value of the register
                        std::cout << data << std::endl;
                        pc += 2;
                        break;
                    }
                    case OP_LOAD:{
                        //OP_LOAD <reg> <value>
                        //load the value into the register
                        auto reg=m_code[pc+1];//reg is the register where the data will be stored
                        auto data=m_code[pc+2];//data is the value that will be stored
                        m_memory[reg] = data;
                        pc += 3;
                        break;
                    }
                    case OP_MOV:{
                        //OP_MOV <reg1> <reg2>
                        //move the value of the register2 into the register1
                        auto reg1=m_code[pc+1];//reg1 is the register where the data will be stored
                        auto reg2=m_code[pc+2];//reg2 is the register that has the required data
                        m_memory[reg1] = m_memory[reg2];
                        pc += 3;
                        break;
                    }
                    default:
                        std::cout << "Unknown opcode: " << m_code[pc] << std::endl;
                        return;
                }
            }
        }
};


int main(){
    std::vector<int> code = {
        OP_LOAD , r1 , 1 ,//$r1 = 1
        OP_LOAD , r2 , 2 ,//$r2 = 2
        OP_MOV , r3 , r1 ,//$r3 = $r1
        OP_PRINT , r3 ,//print $r3
        OP_PRINT , r2 ,//print $r2
        OP_PRINT , r1 ,//print $r1
    };
    auto vm=VM(code);
    vm.run();
}
Enter fullscreen mode Exit fullscreen mode

Compiling and running it

Use the following command to compile it clang++ file.cpp -o output . Then run it ./output . You should see the output on the console

Conclusion

So as you can see VMs are really simple to implement. I would request you to implement your own VM and extend it. Thanks for reading:)

Top comments (0)