DEV Community

Bob Lied
Bob Lied

Posted on

PWC 207: She's givin' me H citatations

Week 207 from the Perl Weekly Challenge is upon us. This week we have some regular expression fun and pretty easy array manipulation with an academic context.

Task 1 Keyboard word

This task states:

You are given an array of words.
Write a script to print all the words in the given array that
can be typed using alphabet on only one row of the keyboard.
Let us assume the keys are arranged as below:
Row 1: qwertyuiop
Row 2: asdfghjkl
Row 3: zxcvbnm

This is asking us to match words against the criterion that all the letters in the word come from a set of characters. That's pretty easily a character class from a regular expression. The entire word must match the letters, so we'll want word-boundary markers in the regular expression, too.

Besides the regular expression, we have to design some code that passes over the list of words, and for each word, tries each possible row of the keyboard.

First, let's write a function that does the second part: decide if a word can be formed from one row of the keyboard.

sub isKeyboardWord($word)
{
    state @Keyboard = ( qr/[qwertyuiop]+/, qr/[asdfghjkl]+/, qr/[zxcvbnm]+/ );
    return any { $word =~ /\A$_\Z/ } @Keyboard;
}
Enter fullscreen mode Exit fullscreen mode

Let's take this apart.

  • sub isKeyboardWord($word) -- I use 5.36, which has function signatures. $word is a string.
  • state @Keyboard -- This array is going to define the rows of the keyboard. I use state here so that it only gets initialized once the first time the function is called.
  • ( qr/[qwertyuiop]+/,...) -- Each element of the keyboard array is a regular expression that matches one or more of a set of characters for that row.
  • return any {...} @Keyboard -- The any function from List::Util will return true as soon as the code block succeeds. The advantage of using any instead of grep is that any will stop as soon as it succeeds, while grep would continue pointlessly testing other elements of the array. It's a trivial optimization here, but worth keeping in mind for larger arrays.
  • $word =~ /\A$_\Z/ -- Here's our regular expression match. It reads as "match from the beginning of the string, one or more characters from a keyboard row, to the end of the string." The pattern from @Keyboard is in $_. The \A anchors the expression to the beginning of the string; \Z anchors the end of the string.

Now that we have a predicate function to test whether any particular word works, we need a loop over a list of words that maps this predicate unto each word. Here's the next level:

sub keyboardWord($list)
{
    my @result = grep { isKeyboardWord(lc trim $_) } $list->@*;
    return \@result;
}
Enter fullscreen mode Exit fullscreen mode

Notes on this function:

  • sub keyboardWord($list) -- our function signature is taking the list of words as an array reference. I prefer to pass aggregate structures by reference, because my experience with C, C++, and Java made me aware of the performance implications of copying aggregates as function parameters.
  • my @result = grep {} $list->@* -- selecting words from a list by matching some condition is the definition of grep, so let's do that. $list->@* is postfix dereferencing; I could have used @$list, but since I adopted postfix, I like to use it consistently.
  • { isKeyboardWord(lc trim $_) } -- The condition for grep is using the function we wrote above. I inserted some string conditioning here: lc is lower-case transformation; and trim is a recent experimental built-in to remove white space around the word. trim is probably unnecessary, but I like to have excuses to use new stuff. The incantation to use trim is no warnings "experimental::builtin"; use builtin qw/trim/;
  • return \@result -- I return a reference to the array of matching words. Not shown here is the testing rig around this; using a reference makes it slightly more convenient to write a test.

This gives us a function to use, but we also want to use this script from the command line. The problem asks for output parenthesized and quoted, like ("Alaska","dad"). It's a one-liner to do that, but worth explaining.

say "(", join(", ", map { qq("$_") } keyboardWord(\@ARGV)->@*), ")";
Enter fullscreen mode Exit fullscreen mode

Let's start from the right end.

  • \@ARGV -- perl has helpfully given us all the command-line arguments in @ARGV, but our function wants a reference.
  • keyboardWord(\@ARGV)->@* -- here we call our function, and it returns a reference to an array. Postfix notation lets us expand the reference when we need it, as we do here to make it the target list for the map.
  • map { qq("$_") } -- for each word in the result, transform it by adding double quotes around it. The qq operator lets us add quotes without needing backslash escapes.
  • join(",", ...) -- now that we have an array of quoted strings, make a comma-separated list in one string.
  • say "(", ..., ")" -- the final step is to put exterior parentheses around the list

Task 2 H-Index

There's a bit of context that goes with this task, but the programming task is actually pretty simple. The problem statement:

You are given an array of integers containing citations a
researcher has received for each paper.

Write a script to compute the researcher’s H-Index. For more
information please checkout the wikipedia page.

The H-Index is the largest number h such that h articles
have at least h citations each. For example, if an author has
five publications, with 9, 7, 6, 2, and 1 citations (ordered
from greatest to least), then the author’s h-index is 3,
because the author has three publications with 3 or more
citations. However, the author does not have four publications
with 4 or more citations.

(For readers without a background in US popular culture, the title of this blog refers to a hit song from the 1960's by the Beach Boys. The lyrics of the song "Good Vibrations" include "I'm pickin' up good vibrations / She's giving me excitation". H citations? Get it? This is why I'm a software developer and not a comedian.)

Throwing aside the academic posturing, what this amounts to is sorting an array descending, and then finding the first index of the array where x[i] < i.

sub H_index(@cite)
{
    my @sorted = sort { $b <=> $a } @cite;
    my $h = first { $sorted[$_] < $_+1 } 0 .. $#sorted;
    return $h;
}
Enter fullscreen mode Exit fullscreen mode
  • sub H_index(@cite) -- So after I went on about how I prefer references, here I am using a full array instead. Consistency is the hobgoblin of small minds.
  • sort { $b <=> $a } -- All the examples are sorted, but let's not assume that. This is the usual idiomatic numeric sort in descending order, which we're going to save in a temporary variable.
  • first {..} 0 .. $#sorted -- Use the range operator to generate the list of indices for the array. The first function from List::Util will walk down the list until the condition matches.
  • { $sorted[$_] < $_+1 } -- And here's our condition. There's a +1 in here because array indexing is zero-based, but the H-Index definition is 1-based.

Top comments (0)