DEV Community

Albert Wu
Albert Wu

Posted on • Originally published at albertywu.com on

An even simpler javascript tokenizer

What’s the easiest way you know of to tokenize an arithmetic expression in javascript? Let’s say you’re building a calculator application, and want this to happen:

console.log(
 tokenize('100-(5.4 + 2/3)*5')
)
// ['100', '-', '(', '5.4', '+', '2/3', ')', '*', '5']
Enter fullscreen mode Exit fullscreen mode

Before you reach into your npm module bag-o-tricks, realize that this can be done in one line of javascript using a secret feature of the string split method. Behold:

'100-(5.4+2/3)*5'
  .split(/(-|\+|\/|\*|\(|\))/)
  .map(s => s.trim())
  .filter(s => s !== '')
// ['100', '-', '(', '5.4', '+', '2/3', ')', '*', '5']
Enter fullscreen mode Exit fullscreen mode

Excuse me? What’s that hot mess inside the split function? Let’s break it down step by step using a few examples of increasing complexity:


Example 1: s.split(/-/)

Pretty obvious: this splits the string s anywhere it sees the minus sign symbol -.

'3-2-1'.split(/-/)
// ["3", "2", "1"]
Enter fullscreen mode Exit fullscreen mode

Example 2: s.split(/(-)/)

The only difference from the previous example is the enclosing parens in the regex, which creates a capturing group. Here’s the key point of the entire article: If the regular expression contains capturing parentheses around the separator, then each time the separator is matched, the results of the capturing group are spliced into the output array.

'3-2-1'.split(/(-)/)
// ["3", "-", "2", "-", "1"]
Enter fullscreen mode Exit fullscreen mode

Example 3: s.split(/(-|\+)/)

This builds off the previous example by adding support for the addition symbol \+. The backslash \ is required to escape the regex. The vertical pipe | acts as an OR statement (match - OR +).

'3-2-1+2+3'.split(/(-|\+)/)
// ["3", "-", "2", "-", "1", "+", "2", "+", "3"]
Enter fullscreen mode Exit fullscreen mode

The Final Boss (tying everything together)

Hopefully, you now have all tools needed to understand .split(/(-|\+|\/|\*|\(|\))/). Hope that made sense! Let me know in the comments if you liked this article, or ping me on twitter!

Top comments (0)