DEV Community

Cover image for SIMD, a parallel processing at hardware level in C#.
Bardia Mostafavi
Bardia Mostafavi

Posted on

SIMD, a parallel processing at hardware level in C#.

SIMD is an acronym of Single Instruction Multiple Data.

It is a feature that lets us performing an operation on multiple chunks of data in one execution on a single core of CPUs like you see in the image:

SIMD

So for example if you want to sum elements in two arrays one by one and make a new array as the result you can use SIMD instead of using traditional ways like LINQ or For loop, and because it is located in hearts of CPUs , It does its job at hardware level so in comparison to traditional ways, more performance can be gained without doing unnecessary multi threading, but as you have already known, in software developing we can't say we prefer SIMD to multi threading ! because all of these patterns have their own usage and their own cons and pros.

SIMD does its job with bigger registers in hearts of CPUs which can hold more bits to execute at one execution cycle, e.g. 256 bit of a register can holds 8x 32bits of data. and CPUs know how to handle these registers by their the instruction of set extensions (e.g. SSE and AVX extensions), and as I know the both Intel and Amd CPUs have had these features for many years.

C# gives us some accelerated SIMD types like Vector4,3,2, Matrix2x3, Plane, etc. and each of them can do some specific operations.

Note that if you want to use SIMD operations you have to use RyuJIT compiler which is included in .NET Core and in .NET Framework 4.6 and later.
You'd better know that this feature has already been implemented in other languages too. So let me wrap it up by an example in C#.

Test-Case: we want to sum 10000 elements of two arrays one by one and put the result in a new array. we do this with 3 different ways and print out the DotnetBenchmark.

[MemoryDiagnoser]
public class Counter
{
    private readonly int[] _left;
    private readonly int[] _right;

    public Counter()
    {
        _left = Faker.BuildArray(10000);
        _right = Faker.BuildArray(10000);
    }

    [Benchmark]
    public int[] VectorSum()
    {
        var vectorSize = Vector<int>.Count;
        var result = new Int32[_left.Length];
        for (int i = 0; i < _left.Length ; i += vectorSize)
        {
            var v1 = new Vector<int>(_left, i);
            var v2 = new Vector<int>(_right, i);
            (v1 + v2).CopyTo(result, i);
        }
        return result;
    }

    [Benchmark]
    public int[] LinQSum()
    {
        var result = _left.Zip(_right, (l, r) => l + r).ToArray();
        return result;
    }

    [Benchmark]
    public int[] ForSum()
    {
        var result = new Int32[_left.Length];  
        for (int i = 0; i <= _left.Length - 1; i++)
        {
            result[i] = _left[i] + _right[i];
        }
        return result;
    }
}

public static class Faker
{
    public static int[] BuildArray(int length)
    {
        var list = new List<int>();
        var rnd = new Random(DateTime.Now.Millisecond);     
        for (int i = 1; i <= length; i++)
        {
            list.Add(rnd.Next(1,99));
        }
        return list.ToArray();
    }
}
Enter fullscreen mode Exit fullscreen mode

SIMDvsForLoop

SIMDvsLINQ

As you can see on the benchmark, SIMD has defeated both For-Loop and LINQ methods by 2x, 27x faster, and actually I've written these methods readable and easy and they are managed, and you can find even better performance in unmanaged codes, At the end I think it would be better to use this feature when you were ensured by the performance that you are going to gain.
more information.

Top comments (0)