A regular expression can save multiple conditionals, loops and string functions, making the code simpler. A one-liner regex code looks elegant and much more readable.
I am sharing some examples here. The first three are PHP and Javascript problems and their solution, followed by a RegEx solution.
The other three examples are about employing regex in SQL database, Apache, Nginx web servers and Linux shell.
Table of Content
- Time to read an article
- Gmail username validation
- IP address validation
- RegExp in SQL
- RegEx in Apache, Nginx webserver
- Linux Shell
Example 1:
Time to read an article
According to a study in the Journal of memory and Language( M Brysbaert), we read 238 words per minute. This function will return minutes to read the text input.
function minutesToRead($text){
$total_words = str_word_count(implode(" ", $text));
$minutes_to_read = round($total_words / 238);
return max($minutes_to_read, 1);
}
echo minutesToRead($content) . ' min read'
Instead of breaking down the text into an array of words, we count the spaces \s
in the text. We can also use \w+
to count the words.
PHP (regex)
function minutesToRead($text){
$total_words = preg_match_all('/\s/', $text, $match);
return max(round($total_words / 238), 1);
}
Javascript (regex)
function minutesToRead(text){
const word_count = text.match(/\s/g).length;
return Math.max(Math.round(word_count / 238), 1);
}
PHP preg_match_all
matches all occurrences. In Javascript, the group flag \g
is used to get all matches.
If the text has HTML tags, use PHP strip_tags
to remove these tags in Javascript use one of these regular expressions to strip tags.
/<[\w\s"-.=%#;'“”!?…{}()\d:\/]+>/g
OR
/<[^<]+>/g
Example 2:
Gmail username validation
A username input needs checks for these rules:
- begins with an English letter
- only contains English letters, digits and dot (.)
- minimum 6, maximum 30 characters long
A non-regex solution would need separate code blocks for each rule converting string to an array, using the filter
function and several conditionals to implement all validation rules in the code.
For brevity, I will go straight to the solution using regular expression.
PHP
function isValidUsername($username){
return preg_match("/^[a-z][a-z0-9.]{5,29}$/i", $username) === 1;
}
Javascript
function usernameIsValid(username){
return /^[a-z][a-z0-9.]{5,29}$/i.test(username);
}
-
^[a-z]
ensures username begins with a letter in the range of a-z. -
[a-z0-9.]
checks rest of the username only contains alphanumeric values and a dot. {5,29}
validates the length of the string is in the allowed range.i
flag is used for a case-insensitive match.
Example 3:
IP address validation
IPv4 address is a collection of four 8-bit integers (from 0 to the largest 8-bit integer 255) separated by a dot (.).
Examples:
192.168.0.1
is a valid IPv4 address
255.255.255.255
is a valid IPv4 address257.100.92.101
is not a valid IPv4 address because 257 is too large to be an 8-bit integer255.100.81.160.172
is not a valid IPv4 address because it contains more than four integers1..0.1
is not a valid IPv4 address because it's not properly formatted17.233.00.131
and17.233.01.131
are not valid IPv4 addresses as both contain leading zeros
Javascript (without regular expressions)
function isIPv4Address(inputString) {
let ip = inputString.split('.');
return ip.filter((e)=>{return e.match(/\D/g) || e > 255 ||
parseInt(e) != e;}).length == 0 && ip.length === 4;
}
PHP filter_var
has an IP validator so, we do not need to write regex here.
PHP
filter_var("192.168.00.1", FILTER_VALIDATE_IP, FILTER_FLAG_IPV4);
Javascript (regex)
function isIPv4Address(inputString) {
const ip = inputString.split('.');
if(ip.length !== 4) {return false};
return ip.every(e => /^([1-9]?[0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])$/.test(e));
}
The IP address split on dots into four strings. Regular expression validates each of the string is an 8-bit integer. Unlike non-regex solution, there is no string to int conversion.
[1-9]?[0-9]
matches numbers between 0 to 991[0-9][0-9]
matches numbers between 100 to 1992[0-4][0-9]
matches numbers between 200 - 24925[0-5]
matches number between 250 to 255|
is OR^
,$
marks the beginning and end of the regex
Example 4:
RegExp in SQL
For example, to extract initials from the name column of a table.
MySQL query
SELECT
id,
name,
REGEXP_REPLACE(name, '(.{1})([a-z]*)(.*)$','$1\.$3') AS REGEXP_name
FROM students;
result
id name REGEXP_name
33 Lesa Barnhouse L. Barnhouse
38 Kurtis Saulters K. Saulters
40 Charisse Lake C. Lake
-
(.{1})
group 1 matches the first character of the name -
([a-z]*)
group 2 matches alphabets up till space -
(.*)
group 3 matches the rest of the name up till the end -
$1\.$3
prints value of group1,.
and value of group3
Note: MySQL regular expressions support is not extensive, and character class tokens are different: like: [:alpha:]
instead of standard \w
. More details on MySQL RegExp manual and O'Reilly's cookbook.
Example 5:
RegEx in Apache, Nginx webserver
For example, a blog with URI articles.php?id=123
uses article_id to display the requested articles. Change it to SEO friendly URI like articles/category/title-of-article_123.html
in the blog. Virtually all articles now have a separate page with the id and relevant keywords in the name.
The web server can regex match the new SEO URLs for id parameter, pass it to the script and display script output for the URL.
Apache2
RewriteRule "_([0-9]+).html$" "/articles.php?article_id=$1"
Nginx
rewrite "_([0-9]+).html$" "/articles.php?article_id=$1";
Example 6:
Linux shell
Regex can save the hassle of opening a file and searching or scrolling for a directive or setting in it. Instead, use a regular expression to match text pattern in a file and get matching lines straight in the terminal.
To find out the value of the AllowOverride
directive in the apache configuration file.
grep -C 2 'AllowOverride' /etc/apache2/apache2.conf
-C 2
flag adds extra lines for context, AllowOverride
matches the exact word. Command outputs this
<Directory /var/www/>
Options Indexes FollowSymLinks
AllowOverride None
Require all granted
</Directory>
To find PHP maximum upload file size without opening long configuration file php.ini
.
grep 'upload.*size' /usr/local/etc/php/php.ini
outputs upload_max_filesize = 2M
More grep information on gnu grep and manual page.
Conclusion
Learning some basic regex and exploring different use cases can help you build a knowledge of the possibilities regex brings.
Knowing where to use regular expressions in coding and problem-solving can help to write efficient code. Elegant, readable code is a bonus.
I will write a second article about regex basics. If you have any comment or a better regex, please share.
Header photo by Michael Dziedzic
Top comments (0)