DEV Community

0xc0Der
0xc0Der

Posted on

Building a simple word counting parser using pari.

In this post I'll implement a simple parser that counts the number of words and lines of the input.

first we need to define what we consider as a white space.

// it's written like that because
// it'll be passed to the `char` parser later.
const whitespace = ' \\r\\n\\t\\f\\v';
Enter fullscreen mode Exit fullscreen mode

Then, define what is a word

a word is a sequence of non white space characters.

So, follows the definition of word and space parsers.

import { char, oneOrMore } from 'pari';

// ...

const wordChar = char(`[^${whitespace}]`);
const wsChar = char(`[${whitespace}]`);

const word = oneOrMore(wordChar);
const space = oneOrMore(wsChar);
Enter fullscreen mode Exit fullscreen mode

So, how do we keep count? we need to define a parser State.

import { State, ... } from 'pari';

class CounterState extends State {
    #wordsCount = 0;
    #linesCount = 0;

    // State must have a `clone` method.
    clone() {
        const state = new CounterState(
            this.input,
            this.index,
            this.status
        );

        state.#wordsCount = this.#wordsCount;
        state.#linesCount = this.#linesCount;

        return state;
    }

    get wordsCount {
        return this.#wordsCount;
    }

    get linesCount {
        return this.#linesCount;
    }

    withIncWords() {
        this.#wordsCount += 1;
        return this;
    }

    withIncLines() {
        this.#linesCount += 1;
        return this;
    }
}

// ...
Enter fullscreen mode Exit fullscreen mode

In, the space parser we need to increase the count of lines by one if we encounter a line character and increase the count of words by one at the last space (the maybe multiple consecutive spaces).

// ...

const space = oneOrMore(wsChar.ok(state => 
    state.charAt(state.index - 1) == '\n'
        ? state.withIncLines()
        : state
)).ok(state => state.withIncWords());
Enter fullscreen mode Exit fullscreen mode

In the word parser we need to handle an edge case that is the end of input.

//...

const word = oneOrMore(wordChar.ok(state =>
    state.charAt(state.index) == ''
        ? state.withIncWords().WithIncLines()
        : state
));
Enter fullscreen mode Exit fullscreen mode

Finally, we define out word counter parser and pass it a state with an input.

// ...
import { firstOf, ... } from 'pari';

const wc = oneOrMore(firstOf([word, space]));

const input = ...;

const result = wc.process(
    new CounterState(input)
);

// print word and line counts
console.log(
    result.wordsCount,
    'words',
    result.linesCount,
    'lines'
);
Enter fullscreen mode Exit fullscreen mode

Thank you for reading 😄, If you have any questions do not hesitate to leave a comment.

Top comments (0)