DEV Community

Cover image for Introduction and History of Regular Expression
Gurkirat Singh
Gurkirat Singh

Posted on • Updated on

Introduction and History of Regular Expression

Hello World! Welcome to the "Regular Expressions" series, where we tackle the intimidating syntax that has spawned numerous memes among developers. Don't worry, though! As we move forward, I assure you that you'll gain the confidence to craft your own elegant regular expressions by the end of this journey.

This cover image is taken from Midjourney.ai Discord server.

In this series, I will utilise https://regex101.com to share patterns with you, avoiding the constraint of a specific programming language that could potentially create barriers for some learners.

Note \to If anywhere coding is required, I will be using C++.

What are Regular Expressions?

Regular expressions are a specific type of text pattern that are used while programming a logic.

I couldn't think of a single modern application that doesn't make use of it, either directly or indirectly. If you went to a website and entered any gibberish text in the email field, you would have received an invalid email format message.

invalid email

This image is taken from here

Under the hood, your input text is being verified by the following regex pattern.

[a-z0-9.-]@[a-z0-9]{2,}\.[a-z]{2,}
Enter fullscreen mode Exit fullscreen mode

You can see how this regex works here. This is a minimal version of matching simple email addresses. Please bear with me if this appears overwhelming. Such expressions can be created in less than 10 seconds.

Why this name after all?

The name "regular expression" originates from the mathematical concept of regular languages, which were first studied by mathematicians in the field of formal language theory. Regular expressions are a way to describe and match patterns in strings of characters, hence the name "regular" expressions.

Problem that Led Development of RegExp

It was created in the 1950s (far before most of us were born) to help with text processing tasks, such as searching and editing. Since then, regular expressions have become a standard feature of many programming languages.

In the early 1960s, Ken Thompson implemented regular expressions in the QED text editor. This was the first time that regular expressions were used in a practical application.

Why You Should Learn Regular Expressions?

Learning regular expressions offers numerous benefits in various fields such as

  • data analysis
  • software development
  • content management
  • scientific research

Their versatility allows you to define complex patterns for finding specific words or phrases, extracting data from structured text, and performing advanced search and replace operations.

Additionally, mastering regular expressions helps prevent costly mistakes by providing a precise and controlled approach to text processing. With a solid understanding of regular expressions, you can confidently handle challenging text manipulation tasks, ensuring accuracy, reliability, and improved productivity.

Since you're here, I'm assuming you're ready to understand and use those unwieldy strings of brackets and question marks in your code.

Flavours of RegExp

There is no established standard that specifies which text patterns are and are not regular expressions. There are numerous languages on the market whose creators have various ideas about how regular expressions should look. So we're now stuck with a whole spectrum of regular expression flavours (implementation of RegExp in the programming language).

But why reinvent the wheel? Instead, every modern regular expression engine may be traced back to the Perl programming language.

Regular expression flavours are commonly integrated into scripting languages, while other programming languages rely on dedicated libraries for regex support. JavaScript offers built-in support for regular expressions using the syntax /expr/ or the RegExp object. On the other hand, Python implements regular expressions through its standard library re.

References

Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. It is a theory in theoretical computer science with close connections to mathematical logic. The word automata comes from the Greek word αὐτόματος, which means "self-acting, self-willed, self-moving". An automaton is an abstract self-propelled computing device which follows a predetermined sequence of operations automatically. An automaton with a finite number of states is called a finite automaton (FA) or finite-state machine (FSM). The figure on the right illustrates a finite-state machine, which is a well-known type of automaton. This automaton consists of states and transitions. As the automaton sees a symbol of input, it makes a transition to another state, according to its transition function, which takes the previous state and current input symbol as its arguments.

A regular expression, sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules called a formal grammar.

Contact Me

Email: tbhaxor at proton dot me
LinkedIn: @tbhaxor
Twitter: @tbhaxor

Top comments (3)

Collapse
 
pauljlucas profile image
Paul J. Lucas

The regex you showed is insufficient for validating e-mail addresses. See here for a more correct one.

Collapse
 
tbhaxor profile image
Gurkirat Singh

Hi Paul! Thanks for sharing this information.

It is the first post so I am keeping things simple :D.

Collapse
 
pauljlucas profile image
Paul J. Lucas

It's OK to keep things simple as long as you acknowledge (in writing) that reality is more complex. You should never give readers the impression that code is correct when it isn't.