A web security story from 2008: silently securing JSON.parse

#cybersecurity #security #javascript #webdev

My 8 year old is doing a report for school on cyber security so I thought I'd dig up an old report for a break-the-web level security vulnerability that we silently fixed and which, as far as I know, has never been disclosed.

Back in 2008, JSON.parse was not part of the JavaScript language. It was a separate library downloaded from json.org that used JavaScript eval to unpack data.

// In the third stage we use the eval function to compile the text into a
// JavaScript structure. The "{" operator is subject to a syntactic ambiguity
// in JavaScript: it can begin a block or an object literal. We wrap the text
// in parens to eliminate the ambiguity.

                j = eval("(" + text + ")");

That code from a modern version of json2.js wraps the JSON source text in parentheses, because {} is a statement block in JavaScript but ({}) is an object constructor.

But a side-effect of using eval is that, if the source text is something like doVeryBadThings() then the JavaScript engine will happily do those very bad things, a classic arbitrary code execution vulnerability.

In computer security, arbitrary code execution (ACE) is an attacker's ability to run any commands or code of the attacker's choice on a target machine or in a target process.

Luckily, json2.js did a bunch of regular expression checks to make sure that text contained valid JSON, allowing commands that construct a value but not more powerful commands.

Unluckily, JSON is not a subset of JavaScript in the semantic sense. There are important differences.

To understand those differences, let's begin at the end with a change to the JavaScript language definition that I argued for based on this research.

Here's the changed specification text. It mirrors some explanatory text from Unicode §6.2.

7.1 Unicode Format-Control Characters
The Unicode format-control characters (i.e., the characters in category “Cf” in the Unicode Character Database such as LEFT-TO-RIGHT MARK or RIGHT-TO-LEFT MARK) are control codes used to control the formatting of a range of text in the absence of higher-level protocols for this (such as mark-up languages).
It is useful to allow these in source text to facilitate editing and display.
The format control characters maybe used in identifiers, within comments, and within string literals and regular expression literals.

And to the right of that is some deleted specification text.

7/2/2008 Deleted: anywhere in the source text of an ECMAScript program. These characters are removed from the source text before applying the lexical grammar. Since these characters are removed before processing string and regular expression literals, one must use a Unicode escape sequence (see 7.6) to include a Unicode format-control character inside a string or regular expression literal

JavaScript allows these control flow characters, known as [Cf], in identifiers because they're important for proper presentation of identifiers, especially those in cursive writing systems like عنصر in the Perso-Arabic alphabet. But they're a bit of a blind spot for many developers.

They were "removed from the source text before applying the lexical grammar." That means before the JavaScript parser has broken the source code into tokens, so

before it pairs quotation marks (") that start and end quoted string values, and
before it pairs delimiters like /* and */ or groups sequences like // and += that have no analogue in JSON.

JSON's specification does not have an equivalent clause. json.org simply says:

A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes.

I realized that, by putting a [Cf] character between a backslash and a quote character I could get JavaScript's eval to find a different end of string than the JSON grammar would as expressed in the regular expressions that vet the JSON text. And that would allow me to sneak code into a JSON string that eval would execute.

I sent an email to the JSON maintainer, Douglas Crockford, with a proof of concept:

</p> \ \ \ \ \