Introduction
I will not focus on history. I will be assuming you are a beginner on the Assembly programming language. And assuming that, I will explain it for anybody who's interested to understand an assembly code to really understand it. The assembler that will be used on this article is nasm (Netwide Assembler) and we will be coding in x86_64 Linux assembly.
Why Assembly?
Assembly is often called "The Father of Programming Languages" because it serves as a bridge between high-level languages and the raw machine code executed by a computer's hardware, providing precise control over system resources and performance.
So, Assembly should be chosen not only for those who are coming from C/C++, but also for those who want to understand about what happens on your code and computer at bit-level.
Today's code
The code we are taking a look today is this one:
section .data
say db "Say something!", 0xA, 0
say_len equ $ - say
input_char db "> ", 0
input_char_len equ $ - input_char
said db "You said: "
said_len equ $ - said
section .bss
input resb 128
section .text
global _start
_start:
mov rax, 1
mov rdi, 1
mov rsi, say
mov rdx, say_len
syscall
mov rax, 1
mov rdi, 1
mov rsi, input_char
mov rdx, input_char_len
syscall
mov rax, 0
mov rdi, 0
mov rsi, input
mov rdx, 128
syscall
mov rax, 1
mov rdi, 1
mov rsi, said
mov rdx, said_len
syscall
mov rax, 1
mov rdi, 1
mov rsi, input
mov rdx, 128
syscall
mov rax, 60
mov rsi, 0
syscall
At the end of this article, you will be able to read it again and understand what it means and how it works. But first we need to understand some things.
Assembly's code structure
The Assembly code is devided in segments/sections (I rather to say sections) and each section has it's own responsabilities. We have the section
-
.data
: Where initialized data (variable pointer) is put. -
.rodata
: Where read-only initialized data (constant pointer) is put. -
.bss
: Where uninitialized data is put (also a pointer). -
.text
: Where the actual executable code is put.
Defined and Reserved bytes
Where something like variables and constants are created.
Defined bytes
Defined bytes are created when you assign a value to an identifier. You need three elements:
- The name of the identifier (label).
- The size in bytes.
- The value.
For example, in C, you might write char* hello = "Hello, World!";
. In Assembly, the equivalent is:
hello db "Hello, World!"
The db
directive stands for "define byte". With db
, it means that every chunk of value of this "variable" will be stored in the size of 1 byte each one.
And you can also define it in chunks of higher value bytes if you need to. These are the possible values:
-
db
: Defines in chunks of 1 byte (8 bits). -
dw
: Define word. Defines in chunks of 2 bytes (16 bits). -
dd
: Define double-word. Defines in chunks of 4 bytes (32 bits). -
dq
: Define quad-word. Defines in chunks of 8 bytes (64 bits).
Reserved bytes
They are pointers to undefined values. An these are the directives:
-
resb
: Reserves in chunks of 1 byte (8 bits). -
resw
: Reserve word. Reserves in chunks of 2 bytes (16 bits). -
resd
: Reserve double-word. Reserves in chunks of 4 bytes (32 bits). -
resq
: Reserve quad-word. Reserves in chunks of 8 bytes (64 bits).
Registers
Registers are like volatile boxes that have a value assigned to it. And since we are coding in x86_64, these are the registers of this family:
General Purpose Registers
-
RAX
: Accumulator (arithmetic operations and function calls). -
RBX
: Base (general-purpose, preserved across function calls). -
RCX
: Counter (used in loops and some instructions likerep
). -
RDX
: Data (arithmetic operations and I/O). -
RSI
: Source Index (string operations and general-purpose). -
RDI
: Destination Index (string operations and general-purpose). -
RBP
: Base Pointer (used to access local variables, preserved across function calls). -
RSP
: Stack Pointer (points to the top of the stack, used in function calls and flow control). -
R8
toR15
: Additional general-purpose registers.
Deeper look into registers
Taking RAX as an example for knowing more about the registers:
- RAX: Re-extended ax. As previously said, it is used in arithmetic operations and function calls.
- EAX: Extended ax. The 32-bit version of RAX.
- AX: 16-bit version of RAX.
- AL: 8-bit subdivision of AX (least significant bit of AX).
- AH: 8-bit subdivision of AX (most significant bit of AX).
Learn more about least and most significant bit here
System Calls
System calls, or syscalls, are the interface between a user program and the operating system kernel. They allow programs to request services from the kernel, such as reading from or writing to files, allocating memory, or terminating a process. In x86_64 Assembly, the syscall
instruction is used to invoke these services.
How System Calls Work
When a program makes a syscall:
- The program sets specific values in registers to indicate the syscall number and its parameters.
- The
syscall
instruction is executed. - The operating system processes the request and returns a result, typically in a register.
You can consult the Linux Syscalls Table here.
Registers Used in Syscalls
-
RAX
: Contains the syscall number (identifies the service to invoke). -
RDI
: The first argument for the syscall. -
RSI
: The second argument for the syscall. -
RDX
: The third argument for the syscall. -
R10
: The fourth argument for the syscall. -
R8
: The fifth argument for the syscall. -
R9
: The sixth argument for the syscall. - The return value of the syscall is stored in
RAX
.
Example: Writing to Standard Output
Below is a simple example where the program writes "Hello, World!" to the terminal using the write
syscall:
section .data
hello db "Hello, World!", 0xA, 0 ; The message to write, followed by a newline and a null terminator
section .text
global _start
_start:
; Syscall: write
mov rax, 1 ; Syscall number for write (1)
mov rdi, 1 ; File descriptor for standard output (1)
mov rsi, message ; Address of the message
mov rdx, 15 ; Length of the message
syscall ; Make the syscall
; Syscall: exit
mov rax, 60 ; Syscall number for exit (60)
mov rdi, 0 ; Exit status (0)
syscall ; Make the syscall
The mov
instruction moves values between registers and memory addresses.
Labels
Labels in Assembly are identifiers followed by a :
(colon). They act as markers in the code, serving as references for jumps, loops, or points to access data. Think of them as "raw functions" or "bookmarks" within your program.
Types of Labels
There are two main types of labels in Assembly:
-
Local Labels: Used within a specific section of the program and cannot be accessed globally. These are typically written with a leading
.
(dot) to signify they are local. -
Global Labels: Accessible throughout the program and often marked with the
global
keyword for external visibility.
Declaring Labels
A label is simply an identifier followed by a colon (:
). For example:
start: ; This is a global label
.loop: ; This is a local label
Using labels for control flow
Labels are often used with control flow instructions like jmp
(unconditional jump) or je
(jump if equal). Here's an example of how labels are used in a loop:
section .text
global _start
_start:
mov ecx, 5 ; Set the counter (ecx) to 5
.loop: ; Start of the loop
dec ecx ; Decrement the counter
jnz .loop ; Jump back to .loop if ecx != 0
; Exit the program
mov rax, 60 ; Syscall for exit
mov rdi, 0 ; Exit code 0
syscall
In this example:
-
.loop
is a local label. - The program jumps back to
.loop
while the counter (ecx
) is greater than zero.
Using Global Labels
Global labels can be accessed across files when linking multiple Assembly files. To declare a global label, use the global
directive:
File 1: function.asm
section .text
global my_function
my_function:
; Code for the function
ret ; returns to the caller
File 2: call_function.asm
extern my_function ; Declare the external function
section .text
global _start
_start:
call my_function ; Call the external function
; Exit the program
mov rax, 60
mov rdi, 0
syscall
The global _start
on each main file tells the linker where is the main entry for it to link.
Offset calculation
-
say db "Say something!", 0xA, 0
:-
0xA
: Adds a newline character (\n
). -
0
: Null terminator (\0
) to mark the string's end.
-
This creates the string "Say something!\n\0"
in memory.
-
say_len equ $ - say
:-
equ
: Defines a constant value. -
$
: Represents the current memory address (after the string ends). -
$ - say
: Calculates the length of the string in bytes by subtracting the starting address (say
) from the current address ($
).
-
This computes the total string length, including the characters, newline (0xA
), and null terminator (0
).
And we're done.
Now read again the code we wrote at the beginning:
prompter.asm:
section .data
; Define initialized data
say db "Say something!", 0xA, 0 ; The string "Say something!" followed by a newline (0xA) and null terminator (0)
say_len equ $ - say ; Calculate the length of the string (current address minus 'say' label)
input_char db "> ", 0 ; The prompt string "> " followed by a null terminator
input_char_len equ $ - input_char ; Calculate the length of the prompt string
said db "You said: ", 0 ; The string "You said: " followed by a null terminator
said_len equ $ - said ; Calculate the length of the "You said: " string
section .bss
; Define uninitialized data
input resb 128 ; Reserve 128 bytes of space for storing user input
section .text
global _start ; Define the program's entry point
_start:
; Display the message "Say something!"
mov rax, 1 ; Syscall number for 'write'
mov rdi, 1 ; File descriptor: 1 (stdout)
mov rsi, say ; Address of the string "Say something!"
mov rdx, say_len ; Length of the string
syscall ; Make the system call
; Display the prompt "> "
mov rax, 1 ; Syscall number for 'write'
mov rdi, 1 ; File descriptor: 1 (stdout)
mov rsi, input_char ; Address of the prompt string "> "
mov rdx, input_char_len ; Length of the prompt string
syscall ; Make the system call
; Read user input (up to 128 bytes)
mov rax, 0 ; Syscall number for 'read'
mov rdi, 0 ; File descriptor: 0 (stdin)
mov rsi, input ; Address to store user input
mov rdx, 128 ; Max number of bytes to read
syscall ; Make the system call
; Display the string "You said: "
mov rax, 1 ; Syscall number for 'write'
mov rdi, 1 ; File descriptor: 1 (stdout)
mov rsi, said ; Address of the string "You said: "
mov rdx, said_len ; Length of the "You said: " string
syscall ; Make the system call
; Display the user input
mov rax, 1 ; Syscall number for 'write'
mov rdi, 1 ; File descriptor: 1 (stdout)
mov rsi, input ; Address of the user input
mov rdx, 128 ; Max number of bytes to display
syscall ; Make the system call
; Exit the program
mov rax, 60 ; Syscall number for 'exit'
mov rsi, 0 ; Exit code: 0
syscall ; Make the system call
Run the code
First we need to assemble it with nasm
:
nasm -f elf64 prompter.asm -o prompter.o
Then link the object file with ld
:
ld prompter.o -o prompter
And run the code:
./prompter
Final considerations
You learned what Assembly is like and how to use it. But, that's only the peek of the iceberg. Thanks for reading and see you in the next article!
Top comments (0)