karapto

Posted on Feb 18, 2023

What is Intel AVX2?

#programming #tutorial #beginners #linux

Abstract of Intel AVX2

Intel AVX2 (Advanced Vector Extensions 2) is an instruction set extension developed by Intel, which is specialized for integer operations. It allows for fast and efficient computations on the CPU, which can result in significant performance improvements in many applications. AVX2 has been supported by many Intel CPUs since the Haswell microarchitecture.

Main Features of AVX2

AVX2 uses SIMD (Single Instruction Multiple Data) instructions to process multiple data elements simultaneously. The main features of AVX2 are as follows:

256-bit YMM registers

AVX2 introduces 256-bit YMM registers, in addition to the 128-bit XMM registers. This allows for twice as many data elements to be processed simultaneously for double-precision floating-point values, integers, and memory access.

Fused Multiply-Add (FMA) instructions

AVX2 introduces Fused Multiply-Add (FMA) instructions, which perform multiplication and addition of two floating-point values simultaneously. This speeds up double-precision floating-point arithmetic significantly.

Integer instructions

AVX2 is specialized for integer operations. It introduces 256-bit integer instructions, which accelerate vector integer operations such as addition, subtraction, multiplication, bit shifts, and bit manipulation.

Gather/Scatter instructions

AVX2 supports Gather/Scatter instructions, which gather data elements from memory into registers or scatter data elements from registers into memory. This speeds up data parallel processing, such as array processing and matrix operations.

Applications of AVX2

AVX2 can result in significant performance improvements in many applications, including:

Image processing

AVX2 is commonly used in image processing. Image processing algorithms often require high-performance operations on large arrays, and AVX2's SIMD instructions are well-suited for these types of operations.

Audio processing

AVX2 is also used in audio processing, which often involves large amounts of data and requires high-performance operations.

Cryptography

AVX2 can accelerate cryptographic algorithms, which often require high-performance operations on large integers.

Scientific computing

AVX2 can improve the performance of scientific computing applications, which often involve large arrays and require high-performance arithmetic operations.

Sample Code

Let's try a sample code.
Before building sample program, you must check that your PC/laptop is supportted Intel AVX2.

$ grep avx2 /proc/cpuinfo
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl eagerfpu pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx avx2 hypervisor lahf_lm arat tsc_adjust xsaveopt

If you see the letters avx2 in your shell, let's move on to cloning and making.
We will use the repository linked below.

https://github.com/Triple-Z/AVX-AVX2-Example-Code

$ git clone https://github.com/Triple-Z/AVX-AVX2-Example-Code.git
$ cd AVX-AVX2-Example-Code
$ make run
make[1]: Entering directory '/home/user/AVX-AVX2-Example-Code/Initialization_Intrinsics/src'
gcc -c -o ../obj/setzero.o setzero.c -I../include -mavx -mavx2 -mfma -msse -msse2 -msse3 -Wall -O
gcc -o ../bin/setzero ../obj/setzero.o -I../include -mavx -mavx2 -mfma -msse -msse2 -msse3 -Wall -O
gcc -c -o ../obj/set1.o set1.c -I../include -mavx -mavx2 -mfma -msse -msse2 -msse3 -Wall -O
.
.
.
../bin/permute
float:          1.000000, 3.000000, 2.000000, 3.000000
double:         6.000000, 5.000000
float:          1.000000, 3.000000, 2.000000, 3.000000, 1.000000, 3.000000, 2.000000, 3.000000
double:         6.000000, 5.000000, 6.000000, 5.000000
-e
../bin/permute4x64
double:         1.000000, 3.000000, 2.000000, 3.000000
long long int:   1, 3, 2, 3
-e
../bin/permute2f128
float:          3.000000, 3.000000, 3.000000, 3.000000, 0.000000, 0.000000, 0.000000, 0.000000
double:         3.000000, 3.000000, 0.000000, 0.000000
int:            3, 3, 3, 3, 0, 0, 0, 0
-e
../bin/permutevar
float:          1.000000, 3.000000, 2.000000, 3.000000
double:         5.000000, 6.000000
float:          1.000000, 3.000000, 2.000000, 3.000000, 1.000000, 3.000000, 2.000000, 3.000000
double:         5.000000, 6.000000, 5.000000, 6.000000
-e
../bin/permutevar8x32
float:          8.000000, 7.000000, 6.000000, 5.000000, 4.000000, 3.000000, 2.000000, 1.000000
int:            8, 7, 6, 5, 4, 3, 2, 1
-e
../bin/shuffle
float:          5.000000, 7.000000, 15.000000, 16.000000, 1.000000, 3.000000, 11.000000, 12.000000
double:         4.000000, 7.000000, 1.000000, 6.000000
int:            5, 7, 7, 8, 1, 3, 3, 4
char:           0, 9, 0, 9, 0, 10, 0, 10, 0, 11, 0, 11, 0, 12, 0, 12, 0, 5, 0, 5, 0, 6, 0, 6, 0, 7, 0, 7, 0, 8, 8, 8
-e
../bin/shufflehi
short:          16, 15, 14, 13, 9, 11, 11, 12, 8, 7, 6, 5, 1, 3, 3, 4
-e
../bin/shufflelo
short:          13, 15, 15, 16, 12, 11, 10, 9, 5, 7, 7, 8, 4, 3, 2, 1
-e
make[1]: Leaving directory '/home/user/AVX-AVX2-Example-Code/Permuting_and_Shuffling/src'

If the above message is displayed, make has succeeded.

Summary

In conclusion, Intel AVX2 is a powerful set of instructions that can significantly improve the performance of many applications. Its SIMD instructions, specialized integer operations, and support for gather/scatter operations make it a valuable tool for developers seeking to optimize performance on Intel CPUs.

Reference

[1] Intel® Advanced Vector Extensions 512, https://www.intel.com/content/www/us/en/architecture-and-technology/avx-512-overview.html, Last Access:18/02/2023
[2] AVX / AVX2 Intrinsics Example Code, https://github.com/Triple-Z/AVX-AVX2-Example-Code, Last Access:18/02/2023

DEV Community