DEV Community

Cover image for LLVM-IR
Plume
Plume

Posted on • Updated on

LLVM-IR

Navigation

3 Equivalent Forms
Infinite Virtual Register Set
load and store Instruction
Three Address Form Assignment
Static Single Assignment
Global and Local Identifiers
icmp Instruction

To Be Continued...

3 Equivalent Forms

LLVM-IR have 3 equivalent forms:

  1. textual form for developers to read and edit manually.
  2. binary form(also called bitcode form) for storing on disk(in .o and .exe files).
  3. in-memory form for compilers and optimizers to use.

These 3 forms can be transformed between each other with no information lose.
Since these 3 forms are equivalent and only the textual form of them is readable, we only introduce the textual form in this article.

Infinite Virtual Register Set

LLVM-IR use a Virtual Register Set which have infinite virtual registers.
When LLVM-IR code is transformed to Native Code, the virtual registers are maped to the real registers of the Host Machine.
Since the Host Machine's registers are finite, the used registers that are nolonger referenced will be reused to map new virtual registers.

Code 1 LLVM-IR Code for Infinite Virtual Register Set:

%a = add i64 9,  8      ; i64 is 64-bit integer type
%b = add i64 %a, 7      ; after ';' are Comments
%c = add i64 %b, %a     ; LLVM-IR's register name must have a prefix: "%"
%d = add i64 %c, %b
%e = add i64 %d, %c
...
Enter fullscreen mode Exit fullscreen mode

Code 2 Pseudo Code After Register Mapping:

AX = add i64 9,  8
BX = add i64 AX, 7
CX = add i64 BX, AX
AX = add i64 CX, BX
BX = add i64 AX, CX
...
Enter fullscreen mode Exit fullscreen mode

Virtual Registers can be used to hold scalar values: integers, floating-point numbers, pointers.

`load` and `store` Instruction

load instruction is used to "read" from memory, and is the only way to "read" from memory;
store instruction is used to "write" to memory, and is the only way to "write" to memory.
load and store access memory with a pointer.

Code 3 LLVM-IR Code for load and store:

%a = load i64, i64* %ptr      ; load i64 type value from memory pointed by i64* type pointer register %ptr
store  i64 99, i64* %ptr      ; store i64 type value 99 into memory pointed by i64* type pointer register %ptr
Enter fullscreen mode Exit fullscreen mode

Three Address Form Assignment

LLVM-IR's assignment-statements always have a =, and there must be one and only one register at the left side of the =.
You can see that, the assignment-statements in Code 1, Code 2 and Code 3 are all in Three Address Form.

Static Single Assignment

In LLVM-IR code, there can be only one assignment-statement for each register.
Although loop code can perform one assignment-statement many times, phi instruction can put many possible values at right side of a assignment-statement's =, SSA(Static Single Assignment) and TAF(Three Adress Form) still significantly simplifies the data-flow of LLVM-IR code, so that many analyses can get the data-flow information they need from LLVM-IR code without any sophisticated data-flow analysis.

But how can we transform non-SSA source code to SSA LLVM-IR code?

Code 4 non-SSA Go Code:

var a i64 = 99
a = 88
a = 77
a = 66
Enter fullscreen mode Exit fullscreen mode

Code 5 Corresponding SSA LLVM-IR Code:

%a = i64 99
%0 = i64 88
%1 = i64 77
%2 = i64 66
Enter fullscreen mode Exit fullscreen mode

When transform non-SSA source code to SSA LLVM-IR code, the compiler auto generate register names with % prefix and numbers.

Global and Local Identifiers

LLVM-IR have two kinds of identifiers: Global Identifier and Local Identifier.
Global Identifiers begin with the @ character, Local Identifiers begin with the % character.
Global Identifiers contain Global Variable names and Function names.
Local Identifiers contain Register names, Label names and User Defined Type names.

We'll illustrate Global Variables later, Labels are similar to the labels in high-level languages.

Code 6 GO Code for Label:

var i i64 = 0
// 'loop' is a label
loop:
i++
println(i)
if i != 10{goto loop}
// other code
Enter fullscreen mode Exit fullscreen mode

Code 7 LLVM-IR Code for Label:

; 'enter', 'loop', 'end' are labels
enter:
    %i = i64 0
loop:
    %i1   = phi i64 [ %i, %enter ], [ %i2, %loop ]     ; if dominated by label 'enter' %i1 = %i;     if dominated by label 'loop' %i1 = %i2.
    %i2   = add i64 %i1, 1
    %cond = icmp ne i64 %i2, 10      ; if %i2 != 10 %cond is "true";     otherwise %cond is "false"
    br i1 %cond, label %loop, label %end      ; if %cond is "true" jump to label 'loop';     otherwise jump to label 'end'
end:
    ;code for label 'end'
Enter fullscreen mode Exit fullscreen mode

icmp, br and phi instructions will be illustrated later.

Struct is a typical example of User Defined Types.

Code 8 GO Code for Struct:

// 'usertype' is a user defined type name
type usertype struct{
    a i8
    b i16
    c i32
    d i64
}
Enter fullscreen mode Exit fullscreen mode

Code 9 LLVM-IR Code for Struct:

%usertype = type { i8, i16, i32, i64 }      ; '%usertype' is a user defined type name
Enter fullscreen mode Exit fullscreen mode

`icmp` Instruction

icmp is used to compare values, return a bool value.

Syntax:

<result> = icmp <cond> <ty> <op1>, <op2>

<cond> is keyword that indicates witch kind of compare to perform.

<ty> is the type of <op1> and <op2>.

<op1> and <op2> are the values to be compared, they can be integer, float-pointing or pointer type, and they must be in the same type.

<cond>s:

  1. eq: equal
  2. ne: not equal
  3. ugt: unsigned greater than
  4. uge: unsigned greater or equal
  5. ult: unsigned less than
  6. ule: unsigned less or equal
  7. sgt: signed greater than
  8. sge: signed greater or equal
  9. slt: signed less than
  10. sle: signed less or equal

Examples:

Code 10 Simple examples for icmp:

%result1 = icmp eq i32 4, 5          ; yields: %result1=false
%result2 = icmp ult i16  4, 5        ; yields: %result2=true
%result3 = icmp sgt i16  4, 5        ; yields: %result3=false
%result4 = icmp sge i16  4, 5        ; yields: %result4=false
Enter fullscreen mode Exit fullscreen mode

Code 11 Complicated examples for icmp:

; Use <cond> ne(not equal) to compare float* type pointer %X and itself.
%result5 = icmp ne float* %X, %X     ; yields: %result5=false

; Use <cond> ule(unsigned less or equal) to compare -4 and 5
; first convert -4 to unsigned 252, then compare 252 and 5
; 252 > 5, so -4 ule 5 is false.
%result6 = icmp ule i16 -4, 5        ; yields: %result6=false
Enter fullscreen mode Exit fullscreen mode

Code 12 Unsure examples for icmp:

; These examples combine two compare operations in one instruction:
; 1. Use <cond> ugt(unsigned greater than) to compare i16 type values 0 and 1; 0 < 1, so 0 ugt 1 is false.
; 2. Use <cond> ugt to compare i16 type values 3 and 2; 3 > 2, so 3 ugt 2 is true.

<%result7, %result8> = icmp ugt <2 x i16> <i16 0, i16 3>, <i16 1, i16 2>    ; yields: %result7=false; %result8=true
%resultptr = icmp ugt <2 x i16> <i16 0, i16 3>, <i16 1, i16 2> ; %resultptr is a <2 x i1>* type pointer, the vector it points to is <i1 0, i1 1> where 'i1 0' represents false, 'i1 1' represents true.
Enter fullscreen mode Exit fullscreen mode

There is no example for multiple compare in LLVM-IR's documentation so I make two by myself, I'm not sure which one is correct or they are all uncorrect.

To Be Continued...

  • br Instruction
  • phi Instruction
  • Global Variable

Top comments (2)

Collapse
 
jwp profile image
John Peters

Love your articles on LLVM.

Collapse
 
chenganxu2014 profile image
Plume

Thank you