DEV Community

Will BL
Will BL

Posted on • Edited on

Let's write a tiny JSON parser in Kotlin! Part 0: Understanding JSON

Nearly every language has some support for JSON: many have official libraries for dealing with it, and for those that don't there are plenty of third-party libraries - you're often spoiled for choice.

With all these options, you'd almost never have to parse JSON yourself. But let's do that anyway!

What is JSON Made of?

JSON is built up of various elements - we can start with the simplest of these and work our way up until we can parse entire documents.

On the JSON website, there are diagrams that show how each element of the JSON specification is defined. These can really help you wrap your head around the grammar, so they'll be reproduced in this post.

Whitespace

A diagram showing the grammar for whitespace

Lets start with the simplest element - whitespace is a string of zero or more spaces, line feeds (\n), carriage returns (\r), and horizontal tabs (\t).

Strings

A diagram showing the grammar for a JSON string

A string is a string of characters enclosed within double quotes ("). Any character can be put within a string, except for the following, which must be escaped:

  • double quotes (") [Escaped with \"]
  • backslash (\) [Escaped with \\]
  • backspace [Escaped with \b]
  • form feed [Escaped with \f]
  • line feed [Escaped with \n]
  • carriage return [Escaped with \r]
  • horizontal tab [Escaped with \t]

In addition, the escape code \/ resolves to a forward slash (/), and a backslash followed by u followed by four hex digits resolves to the character at the Unicode codepoint specified by said hex digits.

Numbers

Numbers are pretty complicated:

A diagram showing the grammar for a JSON number

They consist of an optional - if the number is negative, followed by a digit. If the digit is nonzero, there can follow any amount of digits. If it is zero, there can be no more digits and we move on.

There can then be a point (.), after which there can be any amount of digits representing a decimal fraction.

Afterwards, there can be an e (or E), followed by an optional sign for the exponent (+/-). After this, there can be any amount of digits for the exponent, for scientific notation-style numbers.

😰 Wow, that's a lot. Numbers were definitely the hardest part of this parser to write for me. But don't worry if you don't understand it yet - you'll become well-acquainted with the grammar when writing the parser!

Values

A diagram showing the grammar for a JSON value

In JSON, a value is whitespace, followed by one of:

  • an object
  • an array
  • a number
  • the boolean value true
  • the boolean value false
  • null

followed by whitespace.

Arrays

A diagram showing the grammar for a JSON array

Arrays are nice and simple. A collection of zero or more values (or whitespace) enclosed in square brackets, with a comma after every value but the last.

Objects

A diagram showing the grammar for a JSON object

An object is a collection of zero or more string : value pairs, surrounded by curly brackets, and with commas after all elements but the last.

Closing thoughts

A few elements of the grammar can seem complicated at first (looking at you, numbers...), but those diagrams can be really useful. You may also want to take a look at:

Top comments (0)