Bob Lied

Posted on Nov 20, 2023

PWC 241 Triplets, sorting, and a profiling digression

#perl #pwc #perlweeklychallenge

Perl Weekly Challenge 241 has a couple of tasks that are similar to others we've done, so I wasn't going to spend a lot of time on them, but then I started thinking about optimizations and took off on a tangent about profiling.

Task 1, Arithmetic Triplets is a challenge to find triplets in a sorted array, which is similar to another problem we've done recently (week 234, Unequal Triplets). I wrote about it in this blog.

Task 2, Prime Order is another sorting problem that lends itself to Schwartzian transform, as in week 238, Persistence Sort. I talked about the approach in this blog.

Task 1

So, to cut to the chase, a solution to Task 1 is shown below. It uses a stereotypical pattern for examining the combinations of three indexes at a time.

sub triplet($diff, @nums)
{
    my $count = 0;
    for ( my $i = 0 ; $i <= $#nums-2; $i++ )
    {
        for ( my $j = $i+1; $j <= $#nums-1; $j++ )
        {
            for ( my $k = $j+1; $k <= $#nums ; $k++ )
            {
                if (   $nums[$k] - $nums[$j] == $diff
                    && $nums[$j] - $nums[$i] == $diff )
                {
                    $count++;
                }
            }
        }
    }
    return $count;
}

Task 2

The Task 2 solution uses the factor function from Math::Prime::Util to do the hard part of finding the prime factors, and then applies the trick of installing extra information into the data, sorting on it, and then discarding the extra information.

use Math::Prime::Util qw/factor/;
sub primeOrder(@int)
{
    [
        map { $_->[0] }
            sort { $a->[1] <=> $b->[1] || $a->[0] <=> $b->[0] }
                map { [ $_, scalar(factor($_)) ] } @int
    ]
}

But then ...

As I was checking in task 1, it occurred to me that there was no reason to even try that third loop if the i-to-j difference didn't work. Simple optimization right?

    for ( my $i = 0 ; $i <= $#nums-2; $i++ )
    {
        for ( my $j = $i+1; $j <= $#nums-1; $j++ )
        {
            next unless $nums[$j] - $nums[$i] == $diff;

            for ( my $k = $j+1; $k <= $#nums ; $k++ )
            {
                if ( $nums[$k] - $nums[$j] == $diff )
                {
                    $count++;
                }
            }
        }
    }

And then it occurred to me that we could exploit the fact that the numbers are sorted; therefore, once the difference got bigger than $diff, there was no point in going any further down the j or k indexes. Another simple optimization, right?

    for ( my $i = 0 ; $i <= $#nums-2; $i++ )
    {
        for ( my $j = $i+1; $j <= $#nums-1; $j++ )
        {
            my $dj = $nums[$j] - $nums[$i];

            # Input is sorted, so stop once the difference is too big.
            last if $dj > $diff;
            next unless $dj == $diff;

            for ( my $k = $j+1; $k <= $#nums ; $k++ )
            {
                my $dk = $nums[$k] - $nums[$j];
                last if $dk > $diff;
                if ( $dk == $diff )
                {
                    $count++;
                }
            }
        }
    }

And then I remembered that my intuition for optimization is absolutely horrible. Could it turn out that executing those extra tests would cost more time than would be saved? Let's try some profiling.

TLDR: The best general strategy for performance seems to be getting out of the outer loops as soon as possible. Letting Perl handle the array accesses through shifting is, not so obviously, a good micro-optimization. Any optimization at all is a huge win over the brute force approach of using the nested loops.

As it turns out, this past week I had just read a blog about profiling and I had it in mind to refresh my memory anyway.

The first order of business was to create some data more interesting than the simple test cases in the challenge.

The data we use could have an interesting effect on performance. If we have dense data (like 1000 values, where each value is between 0 and 999), there could be a lot of triples if the difference is small. Or very few if the difference is large. Or the data could be very sparse (like 100 values, but the range of the values could be in the millions) -- the probability of any triples could be remote. Best to have a couple of different data sets. A quick shell/Perl command line to create 1000 random values in order, for sparse and dense data:

perl -wE 'say int(rand(1000)) for  0..999' | sort -n > jdense
perl -wE 'say int(rand(10000000)) for 0..999' | sort -n > jsparse

Let's take a gross first cut. Put each of the three solutions in separate files, named ch-1.a.pl, ch-1.b.pl, and ch-1.c.pl. Run each one on sparse and dense data, and get a rough idea of performance by using the shell's time feature:

$ time perl ch-1.a.pl -d 50 $(<jsparse)
0

real    0m11.313s
user    0m11.129s
sys 0m0.101s

Doing that for several combinations gives us a table of real time results like this:

Algorithm	dense, diff=50	sparse, diff=50	dense, diff=5
ch-1.a.pl	11.280s	11.313s	11.327s
ch-1.b.pl	0.228s	0.184s	0.253s
ch-1.c.pl	0.197s	0.153s	0.129s

Of course, these are single runs and should be repeated to get averages, but at first glance the news is good. That first idea of avoiding the third nested loop was a fantastic optimization. The second idea was not as spectacular, but it seems to help.

There's one more optimization that turns out to be interesting. If we loop over $i by shifting it off the array instead of indexing, we save some array arithmetic and indirect access. The outer loop looks like:

    while ( defined(my $i = shift @nums ) )
    {
        for my $j ( 0 .. ($#nums-1) )
        {
            my $dj = $nums[$j] - $i;
            . . .

To get something closer to statistical validity, we would have to repeat each function many times. Fortunately the Benchmark module gives us any easy way to do that. Let's reorganize our functions into a single file, with each version by a different name. I ended up doing 6 versions, and from slowest to fastest, they show up in this order:

Early k stop 14.0/s           --       -0%         -48%        -100%        -100%        -100%
No k loop    14.0/s           0%        --         -48%        -100%        -100%        -100%
Shift i,j    26.9/s          93%       93%           --         -99%         -99%         -99%
Shift i      3125/s       22294%    22250%       11506%           --          -6%          -6%
Early j stop 3333/s       23787%    23740%       12280%           7%           --           0%
Early exit   3333/s       23787%    23740%       12280%           7%           0%           --

I left out the original version. At nearly 10 seconds per function call, it was too painfully slow to include in comparisons where hundreds or thousands of calls are being made.
Early k stop -- this version tries to end the inner-most loop early. It's too little, too late. Optimizations that avoid the inner loop entirely will be better.
No k loop -- This version skips the inner loop if the first j-i difference has already failed.
Shift i,j -- This versions replaces both the outer loop and the second loop with shifting instead of indexing. But to make that work for the second loop, we have to operate on a copy of the array, which eats into performance.
Shift i -- This version only replaces the outer i loop with shifting. It is an optimization on top of the 'early j stop' version.
Early j stop -- This is the simple optimization of ending the second loop early when the difference can no longer be achieved.
Early exit -- This is the version that ends both the j and k loops when the difference has become too large to possibly succeed.

This is the result for using sparse data, where the number of triplets found is very small, and the differences quickly become large, which leads to giving up on the j loop as soon as possible. Minimizing the second loop and avoiding the inner loop entirely is a huge win.

What if we use dense data, where many triplets can be expected to be found? Here's an example benchmark comparison for that:

               Rate No k loop    Early j stop Early k stop Shift i,j    Early exit   Shift i     
No k loop    8.34/s           --         -36%         -40%         -59%         -85%         -89%
Early j stop 13.0/s          56%           --          -6%         -36%         -76%         -82%
Early k stop 13.9/s          67%           6%           --         -32%         -75%         -81%
Shift i,j    20.5/s         146%          57%          48%           --         -63%         -72%
Early exit   55.0/s         560%         322%         296%         168%           --         -26%
Shift i      74.0/s         787%         467%         433%         261%          34%           --

Interestingly, although Early exit is still a winning strategy, adding the optimization that avoids array indexing (Shift i) is a significant performance booster.

The best general strategy for performance seems to be getting out of the outer loops as soon as possible. Letting Perl handle the array accesses through shifting is, not so obviously, a good micro-optimization. Any optimization at all is a huge win over the brute force approach of using the nested loops.

DEV Community

PWC 241 Triplets, sorting, and a profiling digression

Task 1

Task 2

But then ...

Top comments (0)

Read next

How to generate an executable file for an AI program using SPVM ?

Crypt::OpenSSL::PKCS12 1.93 released to CPAN

Cloud provider comparison 2024: VM Performance / Price

Crafting Robust Applications Across AWS, On-Premises, and Data Centers: A Comprehensive Technical Guide