DEV Community

Thanh Van
Thanh Van

Posted on

SPO600 Algorithm Selection Lab - Project Stage 1

Introduction

The content of this post will be my investigation about the impact of different algorithms, which still produce the same effect. I have watched different type of algorithms, compiled them, and checked what are the differences.

Background of the Project

There are six programs are already provided, each with a different approach to the problem

  • vol0.c is the basic or naive algorithm
  • vol1.c does the math using fixed-point calculations
  • vol2.c pre-calculates all 65536 different results, then looks up the answer for each input value
  • vol3.c is a dummy program - it doesn't scale the volume at all
  • vol4.c uses Single Instruction, Multiple Data (SIMD) instructions accessed through inline assembly
  • vol5.c uses SIMD instructions accessed through Complier Intrinsics

More details about the provided programs can be found here: Project 1

My Prediction

My prediction of the relative performance of each scaling algorithm is vol0 will be the fastest and vol3 will be the slowest. To be honest, this is just my prediction, however, I think they will be the same regarding to the performance.

How I Get Started

Firstly, I have to copy the archive into my directory by using this command:

cp /public/spo600-volume-examples.tgz .
Enter fullscreen mode Exit fullscreen mode

then I have to unzip the archive I just copied by using this command:

tar xvf spo600-volume-examples.tgz
Enter fullscreen mode Exit fullscreen mode

After that, I move to directory where Makefile is contained, and then use make command to build the program. Then I just simply use ./vol0 to run the vol0 and same thing applied to other vol programs.

My first build and test each programs

I did the test on AArch64 architecture firstly, and then I also tested on x84_64 architecture. However, the results were basically the same when I built and tested programs on two architectures. The screenshot below was tested on AArch64 architecture.
first build and test program As we can see, the output of each program is not the same, the result is different for each program I run. I also used time command to see the time of the performance as well performance time

  • real is the total time that the command ran on the system.
  • user is the time it takes to execute the command on the user’s side.
  • sys is the system time it takes to call/execute the command.

Relative memory usage of each program

I use the free -m command to check the relative memory usage of the program on my machine.

Memory usage on AArch64 architecture

memory usage aarch64

Memory usage on x86-64 architecture

memory usage x86_64

Questions marked with Q:

Q: Why is this needed?
        for (x = 0; x < SAMPLES; x++) {
                ttl=(ttl+out[x])%1000;
        }
Enter fullscreen mode Exit fullscreen mode

The reason why we need this loop is because we have to go through all the SAMPLES that we already defined, and then assigned the results out[] to ttl then we could print the output of the program.

Q: Why is this needed?
        printf("Result: %d\n", ttl);
        return 0;
Enter fullscreen mode Exit fullscreen mode

The reason why we need this printf is because there was nothing to print the result of the program vol1. Using printf to print the output for the program

Q: What is the purpose of the cast to unint16_t in the next line?
     precalc[(uint16_t) x] = (int16_t) ((float) x * VOLUME / 100.0);
Enter fullscreen mode Exit fullscreen mode

We casted to unint16_t because it explicitly specified the number of bits, and it also was guaranteed to be an unsigned 16-bit integer.

Q: What's the point of this dummy program? how does it help with benchmarking?
Enter fullscreen mode Exit fullscreen mode

The dummy program does NOT scale the volume. It can be used to determine some of the overhead of the rest of the processing done by the other programs.

Q: should we use 32767 or 32768 in next line? why?
     vol_int = (int16_t)(VOLUME/100.0 * 32767.0);
Enter fullscreen mode Exit fullscreen mode

We should use 32767 since we have already defined a maximum limit for the samples. The samples are starting from the minimum value of an int16_t or a 16-bit signed integer to the maximum value of the 16-bit signed integer.

Q: what is the purpose of these next two lines?
        in_cursor = in;
        out_cursor = out;
        limit = in + SAMPLES;
Enter fullscreen mode Exit fullscreen mode

The purpose of those two lines is to assign the input cursor to an array in and the output cursor to an array out.

Q: what does it mean to "duplicate" values in the next line?
__asm__ ("dup v1.8h,%w0"::"r"(vol_int)); // duplicate vol_int into v1.8h
Enter fullscreen mode Exit fullscreen mode

A duplicate is stored into a vector which will act as an array of equal size. The value to duplicate is %w0 which is the 32-bit register 0. The values to duplicate will be sent into the dup v1.8h.

Discussion (0)