Nearly every language has some support for JSON: many have official libraries for dealing with it, and for those that don't there are plenty of third-party libraries - you're often spoiled for choice.
With all these options, you'd almost never have to parse JSON yourself. But let's do that anyway!
JSON is built up of various elements - we can start with the simplest of these and work our way up until we can parse entire documents.
On the JSON website, there are diagrams that show how each element of the JSON specification is defined. These can really help you wrap your head around the grammar, so they'll be reproduced in this post.
Lets start with the simplest element - whitespace is a string of zero or more spaces, line feeds (
\n), carriage returns (
\r), and horizontal tabs (
A string is a string of characters enclosed within double quotes (
"). Any character can be put within a string, except for the following, which must be escaped:
- double quotes (
") [Escaped with
- backslash (
\) [Escaped with
- backspace [Escaped with
- form feed [Escaped with
- line feed [Escaped with
- carriage return [Escaped with
- horizontal tab [Escaped with
In addition, the escape code
\/ resolves to a forward slash (
/), and a backslash followed by
u followed by four hex digits resolves to the character at the Unicode codepoint specified by said hex digits.
Numbers are pretty complicated:
They consist of an optional
- if the number is negative, followed by a digit. If the digit is nonzero, there can follow any amount of digits. If it is zero, there can be no more digits and we move on.
There can then be a point (
.), after which there can be any amount of digits representing a decimal fraction.
Afterwards, there can be an
E), followed by an optional sign for the exponent (
-). After this, there can be any amount of digits for the exponent, for scientific notation-style numbers.
😰 Wow, that's a lot. Numbers were definitely the hardest part of this parser to write for me. But don't worry if you don't understand it yet - you'll become well-acquainted with the grammar when writing the parser!
In JSON, a value is whitespace, followed by one of:
- an object
- an array
- a number
- the boolean value
- the boolean value
followed by whitespace.
Arrays are nice and simple. A collection of zero or more values (or whitespace) enclosed in square brackets, with a comma after every value but the last.
An object is a collection of zero or more
string : value pairs, surrounded by curly brackets, and with commas after all elements but the last.
A few elements of the grammar can seem complicated at first (looking at you, numbers...), but those diagrams can be really useful. You may also want to take a look at: