DEV Community

Cover image for Hooray! You Improved Your Build Script! But Are You SURE?
Carmel Cohen
Carmel Cohen

Posted on

Hooray! You Improved Your Build Script! But Are You SURE?

How can you measure improvements you made to your build script, in an objective manner, ruling out any human factor.

Long builds are one of my team’s main velocity killers, so when we can think of a way to reduce the build time, we always prioritize it high in our backlog.
Sometimes, after we already celebrated an improvement, we realized that the “improvement” did not achieve the expected results and in some cases, there was even a degradation.
So we started thinking of ways of assuring that changes we make really improve what we had, without having to wait for CI tools statistics, which are available only after several days.

The Naive Approach

The obvious approach is to run the build script (or the part we are about to optimize), measure the time, apply the change we would like to test and run it again.
The main problem is that it is really hard to create improvements in specific parts of the build, so unless the improvement is dramatic, it is hard to evaluate if the change made any improvement at all — network status, CPU load, Hard-Disk/Memory/parallel processes can all impact the test run.
Another problem is a bit psychological:
When we make a change, we are already invested into it, so even if we are not sure that there is indeed an impact, we convince ourselves that there was and impact and it is not reflected since the second run was with “more code”, “when there was more load on the machine”, “during a thunderstorm”, etc.

The Right Approach…

Measurement should be as objective as possible. And luckily, code and computers are objectives (though I can swear that in some cases my machine was doing things on purpose), so if we manage to rule out the human factor, we can measure ourselves and the code that we create.

Time Measurement

Unix provides a simple command to measure process execution time, called “time” (I am sure that Windows have something similar too) — running:

time npm run build

Will run the build command, but it will also measure User time, system time and CPU time and it will echo the results in the form of:
21.82s user 1.49s system 147% CPU 15.802 total
user- the total execution time
system- The time spent in kernel mode (system calls)
CPU- total CPU time
Running this command from a bash file will print it in a slightly different format, but the data is identical

Note

time command should not impact the actual execution of the script, but even if it does, if you run it on both variations of the code (before your alleged improvement and after it), the measurement is still objective.
More about it can be found here

Comparing Apples to Apples

To rule out load and hardware factors, we should try to run both variations of the code in parallel on the same machine without impacting one another.

Well, that’s not really possible, because it is enough that both runs are using the same resources, accessing the same Maven/NPM/Bower/Ruby repositories, etc. they do impact one another.

The next best thing is to run them one after the other N times and measure each in a loop, the more we run this loop, the more accurate the results are (or at least the trends).
Comparing Apple to Apple

The Algorithm

After identifying the way we would like to measure, and the way we would like to create a common environment, we can define an algorithm:

Do N times:

  1. Initialize state A
  2. Run the command we would like to test
  3. Measure and report
  4. Initialize state B
  5. Run the command we would like to test
  6. Measure and report

Create a report and verify that all runs were successful

Simple right? I found myself doing that manually so many times, that I decided to create a script to automate it.

And when it was ready, I thought I should share it with the world.

“Measure Script Time” Command

I created a small Github repo with installations instructions and examples which implement the algorithm above.

The syntax is simple:

measure-script-time [-c <command (npm t)>] [-p <path to results (~/temp/measure-script-time/results)>] [-b <before state>] [-a <after state>] [-n <number of runs (5)>] [-l <number of lines to include in base log>]

Examples:

measure-script-time -c “npm run test” -p ~/tmp/results -b “npm install some-lib@2” -a “npm install some-lib@3” -n 10

=> Will measure the execution time of ‘npm run test’ with version 2 compared to 3 of ‘some-lib’ npm library, and will do it 10 times for each variation

measure-script-time -c “npm run test” -p ~/tmp/results -a “”

=> Will measure the execution time of ‘npm run test’ without any initialization script and will compare it to nothing

measure-script-time -c “mvn clean install” -p ~/tmp/results -b “git checkout master” -a “git checkout my-change-branch

=> Will measure the execution time of a Maven build in ‘my-change-branch’ branch, compared to the master branch.

Since the number of times was not defined, it will run it 5 times (default value).

MacOS users

In order to prevent the computer from going into sleep while running the commands above, it is recommended to run them with caffeinate -i which will prevent your machine from switching into sleep mode as long as the process is running.

The Results

The command will create a few files in the results folder that you defined:

times.txt — a file which will include the execution timing of the runs that the script has executed for example:

Running Tests

########### Run #1 ################

-------- Run #1 BEFORE ----------

real 9m33.932s

user 4m51.215s

sys 3m37.940s

-------- Run #1 AFTER ----------

real 3m51.220s

user 2m38.324s

sys 0m51.458s

########### Run #1 - DONE ################

########### Run #2 ################

-------- Run #2 BEFORE ----------

real 8m27.563s

user 4m26.110s

sys 2m10.327s

-------- Run #2 AFTER ----------

real 4m0.233s

user 2m39.229s

sys 0m49.023s

########### Run #2 - DONE ################

########### Run #3 ################

-------- Run #3 BEFORE ----------

real 8m48.793s

user 4m33.974s

sys 3m58.911s

-------- Run #3 AFTER ----------

real 3m43.939s

user 2m35.435s

sys 0m50.671s

########### Run #3 - DONE ################

########### Run #4 ################

-------- Run #4 BEFORE ----------

real 8m1.954s

user 4m25.315s

sys 2m49.181s

-------- Run #4 AFTER ----------

real 3m57.200s

user 2m36.485s

sys 0m48.136s

########### Run #4 - DONE ################

########### Run #5 ################

-------- Run #5 BEFORE ----------

real 9m23.433s

user 4m27.390s

sys 2m49.781s

-------- Run #5 AFTER ----------

real 3m45.769s

user 2m36.588s

sys 0m50.413s

########### Run #5 - DONE ################

########### Run #6 ################

-------- Run #6 BEFORE ----------

real 8m25.364s

user 4m27.561s

sys 2m51.108s

-------- Run #6 AFTER ----------

real 3m38.693s

user 2m42.206s

sys 0m50.544s

########### Run #6 - DONE ################

This is a real example, see how much improvements we created :)

results.after.txt, results.before.txt — Files containing the last n rows (default is 10) of each run so you can easily check that all runs were successful

logs-full/ — a folder containing the complete logs from the runs of the test command, in the form of “current-run...txt” containing the full log of that run.

Usually, running the test ~5 times provides decent results (excluding the best and worst results leaves us with 3 “pure” test runs), but considering that it is fully automated, I can run it even 1000 times overnight.

There is A LOT to Improve

I know that the script is not perfect, it would be great for example to support:

  • More output formats like CSV
  • Check how many runs have failed before and after
  • Calculate the average of the successful runs
  • And more…

Not to mention the fact that it was written by a front end developer with little experience (more like none) in writing Unix/Bash scripts.

I could have used node, which I am way more experienced with, but I preferred to stay as close as possible to the OS nor to limit myself to NodeJS users only.

This all means that it is far from being perfect or optimized for all scenarios.

The script was only aimed to assist me with improving some of the build scripts for my team.

After using it a few times, with great results, I decided to share it, and have the community benefiting from it (and improving it).

If you have some more ideas and improvements, I would be grateful if you

create a PR, and we will all benefit from it.

I really hope it helped even a bit, just know that if you read everything till this point and you gave the script a try, you have a 0.000000913% more chance of securing your place in heaven, I used measure-heaven-chance command for that, but that’s a different article ;)

Top comments (0)