loading...

Regular Expressions in D

jessekphillips profile image Jesse Phillips Updated on ・2 min read

Meta

In a previous post about string parsing, I noted that regular expressions were available, but did no dive into an example.

Intro

Regular Expressions (regex) is a way of defining a search pattern. It is a very common operation in programming and comes with an age old joke.

I had a problem which I solved with regular expressions

Now you have two programs

What does that mean

For learning how and what regular expressions look like I suggest this help site regularexpressions.info. But you can keep reading to learn about them and how to take advantage of them using D.

Format Phone Number

I wanted to start with something simple, but too simple seemed not useful. In this example I decided to show changing from one US phone format to another.

import std.regex;
void main()
{
    auto data = "555.876.9846";

    auto re = regex(r"(\d{3})\.(\d{3})\.(\d{4})");

    auto ans = replaceAll(data, re, "($1) $2-$3");

    assert(ans == "(555) 876-9846");
}

There are a number of things in this along with the regular expression.

String Literal

string literal

One thing that can be problematic is that the programming language provides special codes string literals can use to represent unprintables. D provides the regex string literal r"" otherwise the above would look like

"(\\d{3})\\.(\\d{3})\\.(\\d{4})"
r"(\d{3})\.(\d{3})\.(\d{4})"

This string is still problematic if you need to use quotes. D also has a wysiwyg literal with back ticks, which is problematic for if that is needed. And D provides... Alright that is enough you can follow the link above.

Regex

std.regex

Digits

\d

The backslash enters into a special "code" in this case d which stands for digit.

Other common codes

  • w - word character (alphanumeric)
  • s - whitespace

Repeat

\d{3}

Here I'm using an explicit number of instances. It will only match if three digits are written in a row. Other common repeat codes:

  • \d* - zero or more
  • \d+ - one or more

Grouping

Regular Expressions allows matched content to be stored in groupings. This is done with the parentheses.

r"(\d{3})"

This creates a match group 1. This is used in the replacement string as $1.

Escape Sequence

\.

In regular expressions the dot . has a special meaning, match any character. I wanted to match on the literal character. Thus I escape the special processing using backslash, just like when I escape into into special meaning with \d.

Posted on by:

jessekphillips profile

Jesse Phillips

@jessekphillips

Senior Quality Assurance (SDET) ¶ Avid hobby D programmer ¶ Telling people what to do because I am right.

Discussion

markdown guide