Discussion on: 100 Languages Speedrun: Episode 47: Raku (Perl 6) Regular Expressions

View post

One (hopefully helpful) tip and one comment:

First the tip: in !!($n ~~ /^ <:N> ** {1..6} $ /), you can replace the "not not" (!!) double-negative with ?, the boolean context operator.

Second, the comment: I don't believe that I agree with your claim that \d would be better off matching only ASCII digits. You gave the example of IP addresses, so lets start there – it may be context dependent, but I'd argue that https://①.①.①.① is a valid IP address. At the very least, it's one that I can navigate to in my browser (firefox).

More broadly, it seems that I'd often want \d to match any digit. For example, when applications require that user passwords contain a digit, they're typically doing so to increase the password's security. But "password๓" is much less likely to be in an attacker's dictionary than "password3" is; rejecting the former but accepting the latter strikes me as perverse at best. (Of course, neither password is decent).

In fact, I'd go further than that: I'd claim that a \d that matches only 0..9 is more likely to cover up bugs than to prevent them. The only time that \d ought to match 0..9 but ought not match other numbers is if the programmer is expecting to get ASCII input but is actually getting utf8 input. But the solution there is to reject non-ASCII input (e.g., test that it matches /^<:ascii>+$/ in Raku) – not just fail to match on non-ASCII numbers). IMO, a more limited definition of \d just hides the problem of not realizing that you're dealing with non-ASCII text (or, put differently, the problem of not correctly handling non-ASCII text).

In any event, I enjoyed the post and am looking forward to the one on grammars :)