This post is part 5 of the series "Learning Functional Programming using JavaScript". When the next part will be published, a link to part 6 will be added here in the top.
This is a continuation of the Functional Programming challenge I gave myself:
Write a JSON parser in JavaScript, using
the parser combinator principles from the blogpost of Scott Wlaschin.
At this point, we are able to parse null
, true
, false
, strings and
numbers. In this post we will complete the initial implementation by adding
support for arrays and objects.
In the previous post we rewrote the .reduce
to use our own. To promote this
behavior, I wanted to get rid of all calls to 'objects'. So I added a new rule
to the ESLint config in my package.json
:
{
"eslintConfig": {
"rules": {
"no-restricted-syntax": [
"error",
{
"selector": "CallExpression[callee.type=MemberExpression][callee.property.name!='log']",
"message": "Not allowed to call object members, except console.log()"
}
]
}
}
}
In production code, I would use system functions rather than rolling my own, but here we are just doing it for
science!learning functional programming concepts. I will reflect on things like this in a later post.
While this test for console.log
is not fool proof, it will start warning about
.map
, .join
and .concat
. So lets start rewriting them:
// before
const toList = mapResult((result) => [result[0]].concat(result[1]));
// after
const toList = mapResult((result) => [...[result[0]], ...result[1]]);
Be sure to wrap result[0]
into an array. In our parser results, this toList
is to flatten a list that has the form of: [a, [b, [c, d]]
. So each time it is
a list being appended to a single value.
// before
const toString = mapResult((result) => result.join(""));
// after
const reduceWithStart = (reducer) => (start) => (tail) =>
doReduce(reducer)(start)(tail);
const join = (a) => (b) => String(a) + String(b);
const toString = mapResult(reduceWithStart(join)(""));
I created a reduceWithStart
that is a variant of the already made reduce
(see
previous post), but this time you can provide a start value for the result. This
is so that empty lists are converted to empty strings.
Considering we extracted 2 helper functions, the code of the toString
is
actually a bit shorter!
// before
const stringParser = (string) =>
addLabel(string)(chain([...string].map((char) => characterParser(char))));
// after
const map = (transform) => ([head, ...tail]) =>
head ? [transform(head), ...map(transform)(tail)] : [];
const stringParser = (string) =>
addLabel(string)(chain(map(characterParser)([...string])));
Here I made a map
function, just like we created the reduce
before (see
previous post). There are other functions using .map
that I needed to convert:
// before
const escapedCharParser = choice(
specialCharacters.map(([match, result]) => parseStringResult(match)(result))
);
// after
const escapedCharParser = choice(
map(([match, result]) => parseStringResult(match)(result))(specialCharacters)
);
Apart from the small change in order (transform before the list) the setup is
basically the same.
// before
const anyOf = (string) => choice([...string].map(characterParser)([...string]));
// after
const anyOf = (string) => choice(map(characterParser)([...string]));
So that was not that hard, and now we are able to enforce another rule!
Now, onto parsing arrays!
Parsing an JSON array
As Scott Wlashin explains in his excellent series,
there is a problem with parsing an array as can be seen in the EBNF notation:
value = object | array | number | string | "true" | "false" | "null";
array = "[", [ value, { ",", value } ], "]";
A value could be an array
, and an array consists of value
! there is a
circular reference here. How can you write a parser that needs to parse itself
as well?
Scott Wlashin fixes this by using a
forward declaration. A
forward declaration is a placeholder, that will be replaced later on.
I tried to follow this pattern in my implementation as well:
const forwardReference = (impl = () => [FAILED, "Unfulfilled"]) => [
(stream) => impl(stream),
(update) => (impl = update),
];
const [valueParser, updateValueParserRef] = forwardReference();
console.log(valueParser("hello")); // [ Symbol(Failed), 'Unfulfilled' ]
updateValueParserRef(characterParser("h"));
console.log(valueParser("hello")); // [ 'h', [ 'e', 'l', 'l', 'o' ] ]
The linter didn't complain, my implementation worked, so I was happy! Time to
start parsing that array! (No worries, this will be done properly later 😉)
These are some examples we should be able to parse:
[null, 1, 3 , [ true ] ]
[ "string [ , ", false]
As you can see from these examples, in and around the brackets []
and the
seperator ,
we can have all kinds of whitespace that does not matter. So we
need some way to ignore those:
const whitespace = " \n\t";
const whitespaceParser = many(anyOf(whitespace));
const ignoreTrailingSpaces = (parser) => andThenLeft(parser)(whitespaceParser);
const arrayStart = ignoreTrailingSpaces(characterParser("["));
const arrayEnd = ignoreTrailingSpaces(characterParser("]"));
const arraySep = ignoreTrailingSpaces(characterParser(","));
console.log(arrayStart("[ null, 1 ]")); // [ '[', [ 'n', 'u', ...
Nice! The spaces are ignored and the next character is waiting to be parsed. Now
we can define our array value parser that makes use of the forward reference:
const arrayValue = ignoreTrailingSpaces(valueParser);
Now we need some parser that can parse a list, divided by seperators. By
combining parsers, we are looking for something like this:
const arrayValues = sepBy(arraySep)(arrayValue);
That ideally would ignore the seperators in the result, and make a flat list of
the result of the values.
Let's build one:
const sepBy = (sepParser) => (parser) => andThen(parser)(sepParser);
const dummyArrayValueParser = sepBy(arraySep)(nullParser);
console.log(dummyArrayValueParser("null, null, null"));
// [ [ null, ',' ],
// [ 'n', 'u', 'l', 'l', ',', ' ', ' ', ' ', 'n', 'u', 'l', 'l' ] ]
After a seperator, we always need to find another element:
const sepBy = (sepParser) => (parser) =>
andThen(parser)(andThenRight(sepParser)(parser));
const dummyArrayValueParser = sepBy(arraySep)(nullParser);
console.log(dummyArrayValueParser("null, null, null"));
// [ [ null, null ], [ ',', ' ', ' ', ' ', 'n', 'u', 'l', 'l' ] ]
But those separators and values can be repeated N times:
const sepBy = (sepParser) => (parser) =>
andThen(parser)(many(andThenRight(sepParser)(parser)));
const dummyArrayValueParser = sepBy(arraySep)(nullParser);
console.log(dummyArrayValueParser("null, null, null"));
// [ [ null, [ null, null ] ], [] ]
Yes! now to flatten that list, by using toList
:
const sepBy = (sepParser) => (parser) =>
toList(andThen(parser)(many(andThenRight(sepParser)(parser))));
But what if the array would be empty?
const dummyArrayValueParser = sepBy(arraySep)(nullParser);
console.log(dummyArrayValueParser(""));
// [ Symbol(Failed), "Error parsing 'null':", 'Unexpected EOF' ]
console.log(dummyArrayValueParser("null"));
// [ [ null ], [] ]
So our implementation would require at least one value. So we rename the
function to sepBy1
and create a new one that would accept 'no values'
const sepBy1 = (sepParser) => (parser) =>
toList(andThen(parser)(many(andThenRight(sepParser)(parser))));
const sepBy = (sepParser) => (parser) =>
orElse(sepBy1(sepParser)(parser))(resultParser([]));
So now our array parser is complete!
const arrayValues = sepBy(arraySep)(arrayValue);
const arrayParser = addLabel("array")(
between(arrayStart)(arrayEnd)(arrayValues)
);
We can test it out by fulfilling our forward reference by calling updateValueParserRef
:
updateValueParserRef(
choice([
nullParser,
boolParser,
numberParser,
quotedStringParser,
arrayParser,
])
);
// parsing our examples:
console.log(valueParser("[null, 1, 3 , [ true ] ]"));
// [ [ null, 1, 3, [ true ] ], [] ]
console.log(valueParser('[ "string [ , ", false]'));
// [ [ 'string [ , ', false ], [] ]
We can now already parse a decent subset of JSON. The only thing missing right now, is
object
s.
Parsing JSON objects
The EBNF definition for a JSON object is as follows:
object = "{", [ string, ":", value, { ",", string, ":", value } ], "}";
So, "Between" brackets, are key-value pairs "seperated" by a comma. The
key-value pair is a String, ":", value. I think we already have all ingredients
for this one! Maybe we need to add some code to produce the end result, but
parsing wise we should already be close!
const objectStart = ignoreTrailingSpaces(characterParser("{"));
const objectEnd = ignoreTrailingSpaces(characterParser("}"));
const objectPairSep = ignoreTrailingSpaces(characterParser(","));
const objectKeyValSep = ignoreTrailingSpaces(characterParser(":"));
const objectKey = ignoreTrailingSpaces(quotedStringParser);
const objectValue = ignoreTrailingSpaces(valueParser);
const objectKeyValue = andThen(andThenLeft(objectKey)(objectKeyValSep))(
objectValue
);
const objectParser = between(objectStart)(objectEnd)(
sepBy(objectPairSep)(objectKeyValue)
);
console.log(objectParser('{ "hello": "world", "fp": true }'));
// [ [ [ 'hello', 'world' ], [ 'fp', true ] ], [] ]
The default result is already a list of key-value pairs. Now to put them into an
object:
const toObject = doReduce((result) => ([key, value]) => ({
...result,
[key]: value,
}))({});
const objectParser = mapResult(toObject)(
between(objectStart)(objectEnd)(sepBy(objectPairSep)(objectKeyValue))
);
console.log(objectParser('{ "hello": "world", "fp": true }'));
// [ { hello: 'world', fp: true }, [] ]
Now to update our value choice, so that objects are included:
updateValueParserRef(
choice([
nullParser,
boolParser,
numberParser,
quotedStringParser,
arrayParser,
objectParser, // new!
])
);
const jsonParser = (stream) =>
onSuccess(
andThenRight(whitespaceParser)(valueParser)(stream)
)(([result, remaining]) =>
remaining.length > 0
? [FAILED, "Unexpected characters:", remaining]
: result
);
// data is our challenge data, see post 1
console.log(jsonParser(data)); // Works!
I added a jsonParser
to check after parsing if there are no remaining
characters left. If there are, return an error. If not, return the parsed
result.
Now I only need to build our get
function to get value from our data, to
complete the challenge, and we need to check if we can tick a box for all the
rules we applied.
Getting data out of our JSON structure
In the definition of the challenge, there were 3 calls that we should be able to
do:
{
"goals": [
"get(data)(\"using.disallowed.0\") should result in \"No dependencies\"",
"get(data)(\"points.full-json\") should result in 1000",
"get(data)(\"jsonTypes.2\") should result in false"
]
}
The thing is, the argument from the get
is also a string that needs to be
parsed! (separted by .
dots).
And we made just the tools to do that...
const pathElementParser = toString(some(satisfy((c) => c !== ".")));
const split = sepBy(characterParser("."))(pathElementParser);
const get = (data) => (path) =>
doReduce((result) => (item) => result && result[item])(data)(split(path)[0]);
const jsonGet = get(parsedData);
console.log(jsonGet("using.disallowed.0"));
console.log(jsonGet("points.full-json"));
console.log(jsonGet("jsonTypes.2"));
Challenge completed!
But there were some things still bothering me. I did an assignment in the
forward reference, and that felt like cheating. But first lets see how we score
on the restrictions we applied, by looking at the defined ESLint rules of our
package.json
...
{
"eslintConfig": {
"globals": {
"Symbol": "readonly",
"Array": "readonly",
"String": "readonly",
"Number": "readonly",
"console": "readonly"
},
"rules": {
"no-console": "off",
"no-use-before-define": "error",
"no-eval": "error",
"no-implied-eval": "error",
"no-restricted-globals": ["error", "JSON"],
"max-statements": ["error", 1, { "ignoreTopLevelFunctions": false }],
"complexity": ["error", { "max": 3 }],
"arrow-body-style": ["error", "as-needed"],
"no-restricted-syntax": [
"error",
{
"selector": "FunctionExpression",
"message": "Please use Lambda notation () =>"
},
{
"selector": "IfStatement",
"message": "Try using ternary operator: true ? 1 : 2"
},
{
"selector": "VariableDeclaration[kind=let],VariableDeclaration[kind=var]",
"message": "Only use constants"
},
{
"selector": "ArrowFunctionExpression[params.length > 1]",
"message": "Only 1 argument allowed. Please use currying"
},
{
"selector": "CallExpression[callee.type=MemberExpression][callee.property.name!='log']",
"message": "Not allowed to call object members, except console.log()"
}
]
}
}
}
I was checking for usage of const
, but I never assigned rules that assignments
where forbidden. I was also missing my "No self referencing recursion" I wanted
to try (see first post).
Time to extend the linting rules some more, but we'll leave that for a next post.
Conclusion
The code so far:
- A full JSON parser (only skipped hex chars in strings)
- A function to grab values from the JSON structure
- Check the full code here
What I learned so far:
- Still the code is more the tools to parse than a specific implementation. I like this approach, it makes most of the code re-usable. The 'get' function is an example of reuse.
- I followed the original blog of parser combinators quite literally. It helped a great deal in the implementation, but the forward reference felt weird. I had find a way to do it without an assignment in there.
What did I like:
- I finished the parser! Everything worked. I never though I would be able to implement it. So far I spent 2 evenings on it.
- It felt nice to break loose from the functions already provided in the language, and do an own implementation for it. Its not something you would do in production code, but for a challenge like this, it is cool to also do the most basics of tool building.
What did I not like:
- The forward reference. I have to find a better way to do this. (Right now it is cheating by reassigning the incoming function argument).
Next time:
- Removing the self referencing recursion
- Fixing the forward reference
Top comments (0)