Marcell Cruz

Posted on Jan 12, 2023 • Edited on Jan 17, 2023

Assembly For JavaScript Developers

#javascript #assembly #linux #debugging

Why Should You Care?
Lines Of Code vs Instructions
Assembly Is Not A Language?
Differences Between Hardware
Why am I writing This?
Fibonacci Code
Naming the concepts
Something Like A Function
Passing Data To The Function
Creating Scope
The Stack, Another Way Of Saving Data.
Using The Stack To Save Data And Create Scope
Returning Data To The Caller
If Statements In Assembly
Conditional Logic
Recursion
Putting Everything Together
Final Code
IO In Assembly
Final Code With IO
How To Debug Assembly
Code

If you only code in high level languages and have no idea what your computer does with your code, this post is for you, no previous knowledge of assembly is required just basic knowledge of any popular language, if you can read and understand the following JavaScript code, you're good to go.

function fib(n) {
  if(n < 3) {
    return n - 1;
  }
  return fib(n - 1) + fib(n - 2);
}

fib(1); // return 0
fib(2); // returns 1
fib(3); // also returns 1 (0 + 1)
fib(4); // returns 2 (1 + 1) and so on...

We're gonna implement this in assembly breaking down each JavaScript concept and implementing the same concept in assembly.

The main problem that I've had learning assembly is that is boring, and the reason why is boring is because you have to understand a lot to do a little, and the thing that you end up doing is not that interesting, so I decided to write this post that tries a different approach.

Why Should You Care?

If you're a JavaScript developer you not going to use assembly in your daily routine and knowing assembly is not gonna make you a better JavaScript developer in any meaningful way, so why should you care?
Well, Knowing assembly will increase your understanding of computing and make you more interested in it, it'll open doors in other areas that might grab your interest, and the most important part, it'll build your narrative about programming, you need to be able to tell a history with the knowledge that you have, to be interested in it, learning assembly will increase your context appreciation and understanding of the whole field, if you feel like you're repeating yourself learning the same concepts over and over just with different names said by different people, maybe it's time to learn something new to expand your view and reignite your interest in programming, at the end of the day you can't be good at something that you don't enjoy and have no interested in, so allow yourself the opportunity of sparkling your interested once more, if you don't know C you can't fully appretiate javascript, if you don't know assembly you can't fully appretiate C, it's important to know where the things that you use come from to understand why they exist, to build meaning, Knowing how to do something is not the same as understanding it, you need a lot of context to understand something, you need to connect the dots, the more context you have the easier it's for you to understand other things in the same area and the deeper is your understanding, you can explain the "why", why something was built the way it was and "why" it's not a good idea to do it differently, what are the downside of a solution, you level of enjoyment and interest in the field also grows, because your mind have a lot more room to wonder in that particular field, so let's get to it.

Lines Of Code vs Instructions

Your computer can't execute the code that you write, let x = 99 your computer has no idea what the previous line of code means, and what do to with it, this line of code needs to be translated to instruction that your CPU can actually execute, the code that you write is called subject code or source code this code needs to be translated to object code real instructions that your CPU can execute, one line of code in a high level language gets translated to multiple instructions that a CPU can actually execute. computers are super dumb, they do not understand any of these high level concepts, they can only do very simple things like:

Move values to different places in memory
Add and subtract numbers
Jump to different instructions

And that's pretty much it, the hardware on your computer is very limited. don't get me wrong, it's amazing that we have circuitry that can actually do these things, but there's a chasm between the concepts in a high level language and what the hardware can actually do. it's the job of the person designing the language to bridge that chasm and implement these high level concepts like functions, recursion and scope using only these very simple tools and that's what we gonna do in this post, we gonna reverse engineer a solution in a high level language(JavaScript) to a low level language(assembly)

Assembly Is Not A Language?

Assembly is not a language, assembly is a family of languages, just like LISP is also a family of languages, if all the code that you write maps directly to a instruction that a machine can do, this language can be called an assembly language, the name "assembly" comes from the fact that what the machine is executing is not exactly the instruction, the machine can execute multiple instructions at once and these instructions are assembled in to binary, the following table illustrates how that happens

So multiple assembly instructions gets assembled to one 16, 32 or 64 bit chunk of data(0000001000000100) that a computer can execute, for example the jump instruction in assembly might be represented by the last 4 bits in these chunks, the first 3 bits represents the move instructions, and so on... you got the idea, you don't need to understand any of this to write assembly, the assembler is smart enough to do it.

An assembly language is made for a specific CPU, since different CPU can do different things and execute different instructions, the difference between what a CPU can do makes the CPU more suited for different tasks, if you need to do a lot of multiplication, using a CPU that can do multiplication on hardware will be a lot better than one that can only add.

Differences Between Hardware

Depending on your task, difference instruction will be more important, and that dictates what instructions will be implemented on hardware, creating chips that can execute complex instructions is expensive but also make the execution of those instructions much faster, that's why people use GPU's for AI, mining crypto and games, GPU's have expensive instructions implemented on hardware, so things that would take a lot of instructions in a regular CPU can be done with a single GPU instruction, GPU makers release documents with these instructions so assembly creators can create a assembly language that maps to these instructions, or even higher level languages like shader languages for example, the following is a pdf document with the instructions for the AMD R700 family if you're curious https://developer.amd.com/wordpress/media/2012/10/R700-Family_Instruction_Set_Architecture.pdf

Why am I writing This?

Most of the tutorials online focus on teaching assembly bottom up, show the instruction tell what the instruction does, then show another instruction and so on, but I think this way of teaching assembly is much more complicated than it needs to be, assembly is not like other high level languages so knowing what a specific instruction do has no value for a python or Javascript developer, because that instruction doesn't map to anything that you already know, saying that MOV moves numbers to registers has no meaning to a high level language developer, the first thing that he's going to think is, what is a register and why would I move numbers to it? So instead of doing this we're going to implement a function that returns the value of a number in a fibonacci sequence, first we're going to write the code in Javascript, then we're going to translate the code to assembly and conceptually "map" each part to the high level code.

Fibonacci Code

Each number in the Fibonacci sequence is the sum of the previous two values expect for the first two values which are 0 and 1
Let's see the javaScript implementation again.

function fib(n) {
  if(n < 3) {
    return n - 1;
  }
  return fib(n - 1) + fib(n - 2);
}

fib(1); // return 0
fib(2); // returns 1
fib(3); // also returns 1 (0 + 1)
fib(4); // returns 2 (1 + 1) and so on...

Ok now let's talk about the " high level concepts" used in this implementation, the idea is the following.

name the concepts
find similar concepts in assembly
translate the concepts to assembly

Naming the concepts

The first concept is "functions", function fib().. is a
function in JavaScript, we can think about what properties does a function have, let's number the properties and think about them.

A function can be called from anywhere in the code.
You can pass data to the function.
The function have a scope.
The function can return data to the caller code execution resumes right after the function call, after the function returns, now we think about something in assembly that is conceptually similar, something that can emulate what we want to do.

Something Like A Function

There are no functions in assembly, but we have something close to it called procedures, if you think about a function in another way, a function is a jump in the program execution, in our example the first line executed was 7: fib(1) this line prompt a jump in the execution to line 1: function fib(n) { in assembly you can name a specific line and tell the program to jump to that line and keep executing the code.

global _start

_start: call Proc
        ret

Proc: push 99
      ret

Code execution starts at _start which is a procedure, the line at start and all following lines will be executed unless we have something to change the next line to be executed, in this case we call another procedure called Proc there's nothing special about the name, it's just a label that's going to be replaced by the address in memory of the next instruction to be executed.

When we run call Proc code execution will jump to line 7, ok so now we have a way of jumping between different parts of the code, the first property that we want, let's look at what is happening.

Above is the code that we just wrote being executed, the first column is the memory address of that line of code, code execution start at the first address, the address is incremented to the next address, and the code on that address is executed, except when we use call, call changes the value of the next line to be executed, so code execution jumps to the instruction push 99, the next line is ret which works like return but can not return values, after that, code execution resumes after the call instruction, ok that's one part of the puzzle now we have a way of changing the execution flow, case 1 is complete.

Passing Data To The Function

Now that we have something that we can call like a function, how do we pass data to it? most of what a higher level language do is managing memory, the biggest benefit of using lanuguages like javascript and python is that you don't need to manage memory yourself, most runtime errors are caused by bad memory allocation, either by the language, OS or the developer in most computers and OSs you have at least 3 ways of saving temporary data that is being used in a process, you have registers, the stack and the heap, registers are the easiest one to understand and use in assembly, an register is a integrated circuit that can save bits, it's hardware, and you have a limited number of those in your computers, a lot of register are used for other tasks by the CPU, so it's not safe to use them, that's why you hear people talking about general purpose registers these are the registers that're safe to be used by your running program, the following image is a list of almost all the registers in an modern computer.

Since a register is hardware, registers are global, even between different programs, if you change the value of a register in one part of your program and then call another library or procedure that reads or change this registers you gonna affect your program or the library that you're using, so it's always a good idea to keep the number of registers that you use to a minimum, just like we do with global variables, The road to programming hell is paved with global variables.

How can we save something in a register?

That's when the MOV instruction comes in, you can move a value to a register MOV RAX, 1 here we're moving the value 1 to the RAX register, and we can do that to save a value and then read it in another part of the program, which is similar to passing data to a function, we can do something like this.

global _start

_start: mov rax, 1
        call Procedure
        ret

Procedure: 
        add rax, 1 ; now the value on rax will be 2 
        ret

In the snippet above we save the value on rax, call the procedure, then read the value from that variable and do something with it, simulating passing values to a function, so 2 is also done.

Creating Scope

Another useful function property is scopes, variables created in a function only exist on that scope and if you recursively call that function passing new values, updating these values won't affect other scopes, this is a key property of functions and is the main reason why they're so useful.

function fib(x) {
  ....
  fib(2);
  ....
}

fib(1);

So can we do something like this in assembly? How can we simulate scope?

As you might have guessed we don't have the concept of a scope in modern CPUs, so we have to simulate one, if you're paying attention the first thing that you would think about would be to use the registers to save and load different scopes depending on what you're executing at that moment, the problem is that we have a very limited number of registers so if we do that we're going to run out of registers to save values pretty fast, they're global as well so abusing registers is not a good idea, OK, so the other general options to save things are the stack and the heap, let's talk about the stack.

The Stack, Another Way Of Saving Data.

Another way of saving data is the stack, the beautiful thing about the stack is it's simplicity,when you run your program the OS reserves some memory for the program to run and sets the address of the beginning of that memory to a register rsp, when you push a value to the stack, it saves this value to the current memory address saved on rsp and increases the memory address to the next one in the stack and saves it on rsp again, when you pop a value, it saves the value on whatever register you tell it to and decreases the rsp memory address number again.

We could do all of that ourselves but x86_64 comes with two instructions that already do all of that for us, push and pop, let's see that happening step by step.

This is the code that we gonna run

global _start

_start: push 99 
        push 99
        pop rax
        pop rbx
        ret

We can push a value to the stack using push, push 99 will push the value 99 to the stack, 99 is 63 in hexadecimal

And this is the stack before running anything

The first column is the memory addresses and each line is the value in memory, when we run push 99 the stack changes

Now we can see our 99 or 63 in hexadecimal there, if we run the next instruct push 99 again, 63 is added to the stack

As you can see we add to the stack in chunks of 64 bits, because my computer is 64 bits if you're using a 32 bit computer you would see a word half the size as the one we're seeing here, you can read more about what word means here here, but basically a word is the size of the data that makes sense to work on a given the computer architecture, you can work with 32 bit registers in a 64 bit computer, but that's going to make your life harder in most cases, let's keep executing our code to see what happens when we pop a value from the stack, pop rax remove the value from the stack and save that value on a register, rax in this case, that's one of the reasons why the default size of the word matters because if we tried to save a 64 bit value in a 32 bit register bad things would happen as you can imagine, rax is a 64 bit register so we can safely do that, let's see what happens to the stack after we pop that value.

Now we only have one 63 again and the value of the registers that points to the top of the stack(RSP) was decreased, if we ran the next pop we gonna remove the next value from the stack and the stack will be in the same state as it was before we started the program, RSP will point to the beginning of the stack. the OS allocates memory for the program and decides which address the stacks starts on.

Using The Stack To Save Data And Create Scope

Now that we know how the stack works, we can use the stack to save and retrieve all register values before calling another procedure, by doing that we can create scope between the different procedure calls, let's see the easiest possible example of that in practice.

_start: mov rax, 1 ; the value that we pass to Procedure
        call Procedure
        ret

Procedure: 
        push rax ; saves rax on the stack
        call ChangeParam ;call a procedure that changes rax
        pop rax ;restore rax original value from the stack 
        ret

ChangeParam:
        add rax, 1; now the value on rax is 2
        ret

Saving the values that we're working on in the stack and restoring them creates scope in the procedure, we could have done the same thing on ChangeParam that's usually the way that we do when we call external libraries, we need to restore the values before calling them, great, now we have a way of creating scope, 3 is done.

Returning Data To The Caller

Well this one is pretty easy, since everything is global in assembly, the caller just needs to read the value in the right place after the call, so if inside the procedure we save the result of the procedure in the RBX register we just need to read the value from that register after the procedure

_start: call Procedure
        mov rax, rbx ; now rbx has the result of proc
        ret

Procedure: 
        mov rbx, 99 
        ret

And with that we're done, now let's analyze the next line of javascript code and do the same thing that we done for the function.

If Statements In Assembly

if(n < 3) {

This is the next executed JavaScript, as you might imagined by now, Assembly x86 doesn't have if statements either, The main function that we want to simulate is Compare two values and based on the result of the comparison jump to different places

Conditional Logic

This one is pretty straight forward, we have CMP in assembly the way that it works is pretty simple, we compare two values, then we use a conditional jump to jump in specific cases, for example we could compare the value on RAX with 1 then do a conditional jump if they're equal.

_start:
  cmp rax,1
  je Proc

Proc:
  push 99

That's pretty much it, not much to talk about here, we have conditional jumps for each comparison greater than(JG), less than(JL), equal(JE) and for some other special cases.

Recursion

The next line is return n -1 we already implemented the return logic, we just need to save the value in the register and use ret to return from a procedure, and for n - 1 we just need to use decrement dec on the register holding the value of n, the last line of javascript is return fib(n - 1) + fib(n - 2); and we also implemented everything that that is to know about this one, the main thing about this line is the fact that is recursive, and we already implemented the idea of a function with scope, so we only need to call the procedure save the data on the stack and the rest that we did to simulate a function call, Now we can put everything together and implement the solution.

Putting Everything Together

Let's go line by line

Function definition

function fib(n) {

Procedure definition

Fib:

If statement

if(n < 3) {

Compare and jump
rax holds the parameter passed to the function

cmp rax,3 
jl L3 ; jump to L3 if rax is less than 3

Inside the if block

return n - 1;

We jump to another part of the code, decrement the register and save the value to another register before returning, since this is the last case we don't need to push rax to the stack again, jumps works in a different way than procedure calls. when you call a procedure ret gonna return you in the line that you called that procedure, when you jump it can look like a procedure but it isn't, the code belongs to the procedure where you used the jump so ret is going to return from that procedure, it works just like a if statement in this case.

L3:
     dec rax
     mov rbx, rax
     ret

return fib(n - 1) + fib(n - 2);

This one is complicated

push rax ; saves the value of rax, rax is n in this context
dec rax  ; decrements rax 
call Fib ; call procedure now with the modified rax value
pop rax  ; pops the original value of rax from the stack
push rbx ; push the result from the previous Fib call 
dec rax  ; decrements rax two times ` n - 2`
dec rax
call Fib ; calls procedure now with the modified rax value
pop rcx  ; pop the result from the *first* Fib call
add rbx, rcx ; add the result from the first Fib call with the second one
ret

As you can see simulating scope is the trickiest part but it's not that hard if you take some time to think about it, we're just using the stack to save stuff that we want to use later and not change the things that we're working right now, the stack is used a lot in this manner, "you don't want to use right now but needs this value later? put it on the stack"

Final Code

global _start
_start:
    mov rax,5
    call Fib 

    ; Some boilerplate code to return control to the OS
    mov eax,1       ; Code for Exit Syscall
    mov ebx,0       ; Return a code of zero 
    int 80H             ; Make kernel call
    ret

Fib:
    cmp rax,3
    jl L3 
    push rax
    dec rax
    call Fib
    pop rax
    push rbx
    dec rax  
    dec rax
    call Fib
    pop rcx
    add rbx, rcx
    ret

L3:  
    dec rax
    mov rbx, rax
    ret

in the case above 5 is the parameter that the fib function receives so the returned value should be 3, 3 is the fifth element in the fibonacci sequence.
if we ran our code and inspects the value of rbx at the end of the execution we get 3

IO In Assembly

One thing that is clearly missing is the ability to pass arguments to our program and to print out the results, I didn't touch on this before because is not really assembly, but we can do that to make our program more useful, the way that something like print("hello world") works is by calling a function of the Kernel that knows how to print things on the screen, we also need to read the arguments passed to the function, transform the numbers in to characters and the characters in to numbers by offsetting the values in the ascii table , I'm not gonna explain this here because the post is already pretty big, the following is the code with IO, you can pass the index in the fibonacci that you want, and the code returns the value in the sequence.

Final Code With IO

%macro  ExitProg 0
    mov eax,1       ; Code for Exit Syscall
    mov ebx,0       ; Return a code of zero 
    int 80H         ; Make kernel call
%endmacro

section .data
    newline db 10,0

section .bss
    printSpace resb 8
    digitSpace resb 100
    digitSpacePos resb 8

section .text

global _start
_start:
    mov rdi, qword [rsp + 16] ; Get the third item in the stack, which is the first argument passed to the program
    movzx rax, byte [rdi]   ; Get the first character 
    sub rax, 48             ; Convert from ASCII to decimal
    call Fib                ; after call Fib `rbx` has the result
    mov rax, rbx            ; moves the result to rax
    call printRAX           ; call procedure to print the result
    mov rax, newline        ; move new line to rax
    call print              ; prints the new line
    ExitProg

Fib:
    cmp rax,3
    jl L3 
    push rax
    dec rax
    call Fib
    pop rax
    push rbx
    dec rax  
    dec rax
    call Fib
    pop rcx
    add rbx, rcx
    ret

L3:  
    dec rax
    mov rbx, rax
    ret

print:
    mov [printSpace], rax
    call printLoop
    ret

printLoop:
    mov cl, [rax]
    cmp cl, 0
    je endPrintLoop
    inc rbx
    inc rax
    jmp printLoop

endPrintLoop:
    mov rax, 1 
    mov rdi, 0 
    mov rsi, [printSpace]
    mov rdx, rbx
    syscall
    ret

printRAX:
    mov rcx, digitSpace
    mov [digitSpacePos], rcx

printRAXLoop:
    mov rdx, 0
    mov rbx, 10
    div rbx
    push rax
    add rdx, 48

    mov rcx, [digitSpacePos]
    mov [rcx], dl
    inc rcx
    mov [digitSpacePos], rcx

    pop rax
    cmp rax, 0
    jne printRAXLoop

printRAXLoop2:
    mov rcx, [digitSpacePos]

    mov rax, 1
    mov rdi, 1
    mov rsi, rcx
    mov rdx, 1
    syscall

    mov rcx, [digitSpacePos]
    dec rcx
    mov [digitSpacePos], rcx

    cmp rcx, digitSpace
    jge printRAXLoop2
    ret

How To Debug Assembly

You can use gdb to debug Assembly but you need to configure it properly, this is the configuration that I use, besides that , to start debugging run

gdb fibonacci

if you needs to pass parameters to the program

gdb --args fibonacci 5

after that you can break on the _start procedure

b _start

run the program

run

Then you can use nexti to go to the next instruction and stepi to step inside a procedure.

i r <register_name> shows the value of a register e.g i r rax

We can print address values of registers and symbols with x/x $rsp to print in hexadecimal and x/d to print as a decimal value.

Code

All code is available here

DEV Community