DEV Community

Gustavo Tavares
Gustavo Tavares

Posted on

SPO600 Lab 05 - Algorithm Selection

Hello

In this lab we are supposed to run different versions of the same program and analyze the performance difference.

This program basically will scale a number of samples that will be decided later but that must be enough for the program to run for 20 seconds or more.

Each version of the program has some variance in in the algorithm.
Here are they:
From our lab page

1.  vol0.c is the basic or naive algorithm. This approach multiplies each sound sample by the volume scaling factor, casting from signed 16-bit integer to floating point and back again. Casting between integer and floating point can be expensive operations.

2.  vol1.c does the math using fixed-point calculations. This avoids the overhead of casting between integer and floating point and back again.


3.  vol2.c pre-calculates all 65536 different results, and then looks up the answer for each input value.

4.  vol3.c is a dummy program - it doesn't scale the volume at all. It can be used to determine some of the overhead of the rest of the processing (besides scaling the volume) done by the other programs.


5.  vol4.c uses Single Instruction, Multiple Data (SIMD) instructions accessed through inline assembley (assembly language code inserted into a C program). This program is specific to the AArch64 architecture and will not build for x86_64.

6.  vol5.c uses SIMD instructions accessed through Complier Intrinsics. This program is also specific to AArch64.

Enter fullscreen mode Exit fullscreen mode

In this lab we will use the AArch64 and x86_64 machines provided by the professor.

AArch64

First, as said by the professor, we need to calculate the overhead of a dummy program.
This way we will know how much overhead we have and this way we can decrease it from our other algorithms runs.

So lets set the number of samples so our program can run for over 20 seconds:
header

Then we compile our programs using the make command:

make

And now lets run vol3 (Dummy program) for 5 times and see how long it takes in average.

aarch64vol3

Lets discard the highest and the lowest values and get the average: 26.946 seconds
This is the value of our overhead that we will subtract from the other programs.

Now lets do the same with others:

vol0:

vol0

Again, lets discard the highest and lowest ones and get the average, then we subtract from the dummy overhead.
Our Average is 27.684 seconds.
27.684 - 26.946 (overhead) = 0.738s.
We had 0.738s scaling time.

vol1:

vol1

Repeating the process, we have 27.501s average
Subtracting the overhead we have: 0.555s scaling time

vol2:

vol2

Average: 33.748s
Subtracting the overhead we have: 6.802s scaling time.

vol4:

vol4

Average: 26.763s.
Subtracting the overhead we have: -0.183s scaling time.

vol5:

vol5

Average: 26.729s.
Subtracting the overhead we have: -0.216s scaling time.


I tried to change the Makefile to try to make it run faster by using the -Ofast and enable the vectorization

Ofast

And had a 3 second average difference from the -O2.


Now lets go to x86_64 machine

x86_65

The dummy (vol3) ovearhead value is 23.650s.

Vol0 is the naïve implementation:
The average was 24.158 s.
Minus the overhead, the time scaling is: 0.508s

Vol1 is the fixed point implementation:
The average was 23.736s.
Minus the overhead, the time scaling is: 0.086s

Vol2 is the precalculated implementation:
The average was 24.676sz.
Minus the overhead, the time scaling is: 1.026s


Finally

Lets answer the questions on the c code:
q1

The dummy program will help us measure the overhead, this is the process that every other vol program will go through so we need to know how much it is in order to know how much is the scaling process time.

q2

Those parts are needed because if you don’t use the result of your calculations, the compiler will see it and avoid it as you are not using it anyway, so In the end your program will end up doing nothing at all.

q3

That’s a good question. I would say to make sure it’s a positive number, but to be honest Im not sure. ☹

I could notice that different algorithms produce different outputs. But running the same program multiple times will produce the same result even with different performance times. I would say that the difference results between algorithms, in terms of audio is not big enough for us to perceive.

Thank you for reading!

Top comments (0)