Hello dear readers!
Once I was asked to write an expression calculator. And my first impulse was to write a classical calculator based on two stacks, also known as shunting yard algorithm. In this approach, one stack is used for values and the other for operators. It's a classical solution but I wanted to do something more interesting and inspiring and fun...
Intro
We have a mathematical expression like: 2 + 2 * (2 + (2))
and we need to calculate the result. There are some problems:
- Resolving brackets
- Resolving operation priorities
Let's do it with regular expressions!
Bracket expander
The regular expression could be easily used for finding some patterns in the string.
The first simple patter is a pattern for expanding brackets:
Like this: \(\d+\)
This pattern detects an open bracket followed by any number of digits and finally followed with a closed bracket.
This simple regular expression is able to catch the expression: (42)
.
But how could we extract the number and reject the brackets?
Hopefully, the regular expressions have the ability to catch results into the named groups.
Let's update our regex example: \((?<number>\d+)\)
With this regex, we could find numbers with brackets and take only a number.
Addition operation
Let's check more complex operations. Addition.
For adding two numbers we have to write a regular expression to find the number plus number.
(?<numberA>\d+)\+(?<numberB>\d+)
Here we have two named capture groups and a "hardcoded" plus symbol.
This expression allows us to detect expressions like 1+1
.
Multiplication
What's the difference between addition and multiplication? Actually only an operation. So, we use almost the same regular expression to detect multiplication. And the same for division and subtraction
Let's create an expression solver
Let's assume we've got an expression 2*2+(2+2)
At this moment we have got a bunch of regular expressions and small operation evaluators.
In a simple way we've got regular expressions for:
*( number ) => number
*numberA + numberB => sum
*numberA - numberB => sub
*numberA * numberB => mul
*numberA / numberB => div
Step by step calculation
2*2+(2+2)
- 2*2+(2+2)
- 4+(2+2)
- 4+(2+2)
- 4+(4)
- 4+(4)
- 4+4
- 4+4 => 8
How about priorities?
It's a common question while we are talking about expression evaluators.
And now we could use an interesting feature of regular expressions: Negative lookahead.
The main idea is to avoid matching num + num surrounding with high priority operators (* /).
A bit simplified pattern:
(?![*/])(?<numberA>-?\d+\.?\d*)\+(?<numberB>-?\d+\.?\d*)(?![*/])
Easy as pie. (c) Daddy Pig :-)
How to add constants?
The most common constant is Pi., (3.14). In my solution, we could support constants in the same manner as the opening brackets.
(?i)PI
The regular expression "?i" is a case insensitive switch.
How to add math function support?
What is the main difference between a constant Pi and, for example, Sinus function?
Theis only one difference - brackets. And we should construct a regular expression to detect text like Sin(number)
.
One important moment - we already have an open bracket function. And we also have a potentially conflicted Sin() function. And we have a simple way to solve this kind of problem - expression order!
Let's check an example:(Sin(Pi))+1
To evaluate this function correctly we have to apply regexp substitutions in a particular order:
- Replace Pi (Sin(Pi))+1 => (Sin(3.14))+1; Match: Pi
- Calculate Sin(3.14) (Sin(3.14))+1 => (0.00)+1; Match: Sin(3.14)
- Open brackets (0.00)+1 => 0.00+1; Match: (0.00)
- Calculate Sum 0.00+1 => 1.00; Match: 0.00+1
Additional possibilities
I show only small examples with calculating only a simple math expression. Let imagine that our expression is a real DSL (Domain Specific Language) where Sin(x) is not a fast, but really time-consuming function. In this case, we could use the regular expressions ability to give us not only the first but all possible matches. For example, for function Sin(0.5)+Sin(0.5)
we detect two Sin function at once. As a result, we could use a parallel calculation.
Experiments!
To continue my experiments, I wrote a classical two-stacks-based evaluator. After I created a random expression generator and run both calculators over the randomized expression until both could calculate expression without errors.
I got expected results:
*( 7 * 6 ) * 2 = 84
*4 ( - ( 5 ) ) = -1
*3 * ( 8 ) * 8 = 192
*7 + ( 1 - 8 ) * 9 = -56
But! also a completely incorrect expression like this:
*8 3 + - ( 9 ) = 2
Regular expression based calculator allows us to see the whole evaluation path step by step:
8 3 + - ( 9 ) => 8 3 + -9; Match: ( 9 )
8 3 + -9 => 8*-6; Match: **3 + -9*
8-6 => 2; Match: 8-6
The classical two stacks algorithm interprets this expression as
(3+8)-9
with the same result 2.
Conclusions
It's possible to create a simple calculator-like expression evaluator based on a regular expression. It's fun.
Classical shunting yard algorithm could evaluate erroneous expressions.
Links
Fun with calculators github repository
P.S. Feel free to comment/contact me if you like this work.
Top comments (0)