SPO600 - Project Stage 1

Introduction

Hello everyone, my name is Dustin, and today I'd like to talk about my experience working on project stage 1, and some of the testing stuff as well as the performance of different types of algorithms.

About the this project stage

The project requires us to compile different algorithms to see if there's any difference between them. There's six programs that have been provided by the professor.

vol0.c is the basic or naive algorithm. This approach multiplies each sound sample by the volume scaling factor, casting from signed 16-bit integer to floating point and back again. Casting between integer and floating point can be expensive operations.
vol1.c does the math using fixed-point calculations. This avoids the overhead of casting between integer and floating point and back again.
vol2.c pre-calculates all 65536 different results, and then looks up the answer for each input value.
vol3.c is a dummy program - it doesn't scale the volume at all. It can be used to determine some of the overhead of the rest of the processing (besides scaling the volume) done by the other programs.
vol4.c uses Single Instruction, Multiple Data (SIMD) instructions accessed through inline assembley (assembly language code inserted into a C program). This program is specific to the AArch64 architecture and will not build for x86_64.
vol5.c uses SIMD instructions accessed through Complier Intrinsics. This program is also specific to AArch64.

My very first prediction was that vol0 would be the slowest because it has some castings, and vol1 will be faster than vol0, and vol3 will be the fastest.

Process

First, I ran some tests on AArch64 architecture as well as x*4_64 architecture. The result it gave back to me was quite similar to each other though.

But before actually running the program, I have to copy the folder to my current directory so that I don't need to move around too much and then I have to unzip the archive

For copying to current directory I used
cp /public/spo600-volume-examples.tgz .

For unzipping the folder, I used
tar xvf spo600-volume-examples.tgz

Because I copied it to my current working directory, I can test it now in my current directory. Then, I would have to use a command which is called make to build program.

Make will give me programs highlighted on green

Then in order to invoke a program, I will have to just simply type the file name and then enter.

This the test that I tried to run on AArch64 Architecture. I also run the same on x84_64 Architecture but the result was pretty much the same to each other.

The screenshot below shows the results given back when I ran on AArch64 architecture. Some of the results (vol0, vol2l vol4, vol5) are the same at 481.

Next, in order to know its performance, I have to run it with time command.

As we run programs with time, it gives us back a bunch of information: real, user, sys. From my perspective, Real and User are the same and all sys is O. real is the total time that the command ran on the system while user is the time it takes to execute the command on the users` side.

After that, I use the command free -m to check the relative memory usage of the program on my current machine.

On AArch64 architecture

On x86_64 architecture

Questions found in program comments:

for (x = 0; x < SAMPLES; x++) { ttl=(ttl+out[x])%1000; }
We have a loop right here to go through all SAMPLES variables that we defined above. Then we have a calculation of (ttl + out[x]) then divide it with 1000 to get the remaining.

printf("Result: %d\n", ttl); return 0;
The result of ttl will be printed here by using printf

Q: What is the purpose of the cast to unint16_t in the next line? precalc[(uint16_t) x] = (int16_t) ((float) x * VOLUME / 100.0);
Because we wanted to make sure that it is an unsigned 16-bit integer

Q: What's the point of this dummy program? how does it help with benchmarking?
The dummy program will not scale the volume. And it helped different computer run the same program to test its performance.

Q: what is the purpose of these next two lines? in_cursor = in; out_cursor = out;
The purpose of these two lines are to assign the input cursor to array in and output cursor to array out

Q: should we use 32767 or 32768 in next line? why? vol_int = (int16_t)(VOLUME/100.0 * 32767.0);
I think we should use 21767 because this is a 16 bit integer, the max value is 32767.

Q: what does it mean to "duplicate" values in the next line? __asm__ ("dup v1.8h,%w0"::"r"(vol_int)); // duplicate vol_int into v1.8h
As I can understand, the duplicate will be stored in a vector, which will be like an array that has the same size. The duplicate will be stored in the dup v1.8h.

Thank you for reading.