DEV Community

Abdullah Ali
Abdullah Ali

Posted on

The Holey Array Problem

One of the "features" of JavaScript I hate the most is "holey" arrays. If you're unsure what that is, consider the following:

const array = [1, 2, 3];
Enter fullscreen mode Exit fullscreen mode

That is what is called a "packed" array. Elements are contiguous and the array is made up of one element type: number.

On the C++ side: when using V8 (aka Node.js), and under the hood, this array is actually stored as PACKED_SMI_ELEMENTS, which is a way to store small integers in memory, and arguably the most efficient out of the myriad of ways V8 will store your arrays.

Now consider this innocuous line of code:

array.push(3.14); // push a floating point number to the array.
Enter fullscreen mode Exit fullscreen mode

On the C++ side: your array has just been transformed from PACKED_SMI_ELEMENTS (integers) to PACKED_DOUBLE_ELEMENTS(doubles) in memory. It became slightly different, but is still tightly packed and performant. This transformation is irreversible.

Nothing has changed on the JavaScript side.

Moving on to the next step:

array.push('Hello world!'); // push a string to the array
Enter fullscreen mode Exit fullscreen mode

On the C++ side: your array has just been irreversibly transformed again. This time from PACKED_DOUBLE_ELEMENTS to PACKED_ELEMENTS. A PACKED_ELEMENTS array can hold any JavaScript value; but has to sacrifice much more memory space to represent itself compared to a SMI or Double array.

Now let us proceed to the next line of code:

console.log(array.length); // 5
array[9] = true;
console.log(array.length); // 10
Enter fullscreen mode Exit fullscreen mode

This is allowed in JavaScript, right? You may assign to an arbitrary index in the array, and your array will be padded. So what happens on the C++ side?

On the C++ side: your array has been irreversibly transformed yet again, this time to HOLEY_ELEMENTS. Much, much slower to work with; and you've just made the V8's JIT (Just-In-time compiler) optimisations much harder, as it will be unable to optimise your program to a large extent.

It's worthy of note that calling new Array(n) or Array(n) will always create this type of array and slow down your code.

But why stop here? Let me introduce Satan's special data structure:

array[999] = 'HAIL SATAN! ♥'
Enter fullscreen mode Exit fullscreen mode

On the C++ side: your array has just transformed from HOLEY_ELEMENTS to DICTIONARY_ELEMENTS, and you've summoned a demon that can no longer be banished.

Let me quote the V8 source code directly:

  // The "slow" kind.
  DICTIONARY_ELEMENTS,

From JavaScript's point of view: your array just became a dictionary, or in other words: a plain object. The literal worst case scenario of JavaScript arrays.

Why this is dangerous:

  • Such an operation will silently succeed and never throw an error.
  • Any form of loop-based enumeration or attempt at serialisation will most likely crash your server.
  • The array's keys will silently be converted to strings.
  • The array will still serialise to an array, not an object. (JSON.stringify will try padding all empty indices using nulls)
  • Array.isArray(array) will return true for DICTIONARY_ELEMENTS arrays.

If you try to call JSON.stringify on the array above, this is what you'll get:

[1,2,3,3.14,"Hello world!",null,null,null,null,true,null,null,null,null,null,null,null,null,null,null,null,null,null,...,null,null,null,null,"HAIL SATAN! ♥"]
Enter fullscreen mode Exit fullscreen mode

How this can be used against you:

Consider the following example of a REST API using express to manipulate a todo list:

// Naïve example of holey array potential vulnerability

class Todos {

  constructor(username, items) {
    this.username = username;
    this.items = items || Todos.load(username);
  }

  // add a new todo
  add(todo) {
    this.items.push(todo);
    return this.items.length - 1;
  }

  // update an existing todo
  update(index, todo) {
    // index is expected to be an integer
    // we're making the mistake of accepting an arbitrary/unbounded index here though
    // this operation will succeed silently, and node won't throw any errors with a huge index.
    // e.g. if an attacker passes 10000000, the program won't crash or show signs of instability, the array will silently become "DICTIONARY_ELEMENTS".
    this.items[index] = todo;
    return index;
  }

  remove(index) {
    return this.items.splice(index, 1);
  }

  // another common case:
  // you're keeping a list of todos and want to give the user the ability to reorder items.
  swap(i1, i2) {
    const temp = this.items[i1];
    this.items[i1] = this.items[i2];
    this.items[i2] = temp;
  }

  // load a list of the user's previously saved todos
  // we’re not using a database for simplicity’s sake
  static load(username) {
    const userPath = path.join('data', this.username + '.json');
    if (fs.existsSync(userPath) {
      return JSON.parse(fs.readFileSync(userPath, 'utf8'));
    }
    return [];
  }

  // this saves the array back to disk as JSON when the request is ending
  // holey/dictionary arrays with absurd indices will pad empty ranges with `null`.
  // this could result a multi-gigabyte file if passed a holey/dictionary array with a big enough (sparse) index in them. Most likely we’ll run out of memory first because the resulting string will be too big.
  save() {
    fs.writeFileSync(path.join('data', this.username + '.json'), JSON.stringify(this.items));
  }

}

app.use((req, res, next) => {
  // initialise/load previous todos
  req.todos = req.todos || new Todos(req.session.username);
  next();
});

// add new todo
app.post('/todos/new', (req, res, next) => {
  if (req.body.payload)
    res.json({ index: req.todos.add(req.body.payload) });
  else
    res.status(500).json({ error: 'empty input' });
});

/// update existing todo (vulnerable to unbound indices!)
app.post('/todos/:idx/update', (req, res, next) => {
  if (req.body.payload)
    res.json(req.todos.update(parseInt(req.params.idx, 10), req.body.payload));
  else
    res.status(500).json({ error: 'empty input' });
});



// save current todo list after request
// a better idea is to override res.end() via a thunk though.
app.use((req, res, next) => {
  next();
  req.todos.save();
});
Enter fullscreen mode Exit fullscreen mode

Here’s an example malicious request: POST /todos/10000000/update payload="hi"

You now have an invisible problem (10000000 element dictionary array) in memory, when the request ends it will try to write out a huge JSON file, or your server will run out of memory trying to serialise the array to a string.

For further reading on V8 internals:

https://v8project.blogspot.com/2017/09/elements-kinds-in-v8.html
https://v8project.blogspot.com/2017/08/fast-properties.html

Top comments (7)

Collapse
 
amenadiel profile image
Felipe Figueroa

Why does array[999] = 'HAIL SATAN! ♥' convert the array to a plain object? Is it because of the index or because the special character?

Collapse
 
voodooattack profile image
Abdullah Ali

The index. The content has no bearing on the matter.

When you poke enough holes into the array, V8 deems it as “sparse enough” to get this treatment.

Collapse
 
mohamedlamineallal profile image
Mohamed Lamine Allal

Tank you for the article. Holey make a better allocation against PACKED one (specially matter when the array get big). Against slower operation and access performance. What's the solution to avoid both pitfalls. Is TypedArrays the solution ? Or they have there pitfalls? Typed arrays aren't HOLEY.

Collapse
 
hugoheneault profile image
Hugo Heneault

Great one! One quick question: what are the solutions? :-)

Collapse
 
voodooattack profile image
Abdullah Ali • Edited

The solution is to guard against unbounded inputs when writing to arrays.

You can use if statements to check if the value is within bounds, and throw an error otherwise.

Alternatively, a very simple solution is to use the % modulo operator to constrain the index like this:

array[input % array.length] = value; // will never go out of bounds
Enter fullscreen mode Exit fullscreen mode

With the method above, any out-of-bounds input will be "wrapped around" in the range of 0:N.

For example, if the length of the array is 20 and you supply 10: 10 % 20 = 10, so nothing really changes when the supplied input is within bounds.

But if you supply an out-of-bounds index, say 25: 25 % 20 = 5, and it remains within bounds.

It's worthy to note that this approach also guards against negative inputs, because -25 % 20 equals 15.

Edit: Fixed my numbers. Still waiting on my coffee, sorry.

Collapse
 
yurist38 profile image
Yuri Drabik

Great article! Good to know it, thanks!

Collapse
 
abdulghani200 profile image
Abdul Ghani

Thanks for sharing such a valuable information ❤️