DEV Community

Peter Benjamin (they/them)
Peter Benjamin (they/them)

Posted on • Updated on

Type Manipulation Vulnerabilities

Type Manipulation Vulnerability

Recently, I read a Type Manipulation vulnerability article that piqued my interest. So, I set out to understand it better and bring back some learning to the community.

Table of Contents

Introduction

At a very high level, all programming languages need to have type constructs in order for the interpreter or the compiler to translate human-readable code into machine code. Dynamic languages, static languages, interpreted languages, compiled languages all need types.

Here are some examples:

let firstName = "Peter";
typeof firstName; // returns 'string'
# python
firstName = "Peter"
type(firstName) # returns <type 'str'>
# ruby
first_name = "Peter"
first_name.class # returns String
// go
firstName := "Peter"
reflect.TypeOf(firstName) // returns string

Great, programming languages have types! So, what's the problem?

Well, say you want to take some user-supplied input and send it to the server for some processing, which includes evaling the input (a common scenario for server-side rendering/templating engines).

So, you say to yourself: "That's fine as long as I sanitise the user input for characters that would allow them to escape or run arbitrary code on my server, right?"

Right, but let's consider the following scenario...

The Sanitisation

For this contrived example, let's check for and sanitise 3 known "bad" characters in our users' input:

  • < should be encoded to &lt;
  • > should be encoded to &gt;
  • & should be encoded to &amp;

So, we write this function:

const AMP = "&";
const LT = "<";
const GT = ">";

const ESCAPECHARS = RegExp(`${AMP}|${LT}|${GT}`, "g");

function sanitise(input) {
  if (typeof input === "string") {
    if (!ESCAPECHARS.test(input)) {
      // input is good
      return input;
    }
    // input is bad, sanitise...
    return input
      .replace(AMP, "&amp;")
      .replace(LT, "&lt;")
      .replace(GT, "&gt;");
  }
  return input;
}

sanitize('good input');                             // returns 'good input'
sanitize('<script>alert("bad input")</script>');    // returns '&lt;script&gt;...'

If we accepted this input from a client-side application, it might look like:

// server.js
const http = require("http");
const url = require("url");
const querystring = require("querystring");

// sanitise logic here...

const requestHandler = (req, resp) => {
  let parsedURL = url.parse(req.url);
  let userInput = querystring.parse(parsedURL.query);
  let sanitised = sanitise(userInput.foo);
  resp.end(JSON.stringify(sanitised));
};

const server = http.createServer(requestHandler);

const PORT = 3000;

server.listen(PORT, err => {
  if (err) return console.log(`server failed: ${err}`);
  console.log(`server listening on ${PORT}`);
});

Run it with $ node server.js & browse to localhost:3000
The response will be an empty page.

What about: localhost:3000/?foo=bar
Response:

"bar"

Ok. Now moment of truth: localhost:3000/?foo=<bar>
Response:

"&lt;bar&gt;"

At this point, you might be feeling confident about our sanitisation.

The Manipulation

But, what about: localhost:3000/?foo=bar&foo=<script>alert(1)</script>
Response:

[
  "bar",
  "<script>alert(1)</script>"
]

Uh-oh! We bypassed the sanitisation logic entirely, because we only check at runtime if typeof input === 'string'.

Interestingly, if this data is consumed on the client-side (e.g. display content of document.location.href to the user), this would constitute a DOM-based Cross-Site Scripting Attack (DOM-XSS for short), but that's a topic for another article.

But, if this data is to be processed, or perhaps even evaled on the server (e.g. server-side rendering), then an attacker could break out of your logic and run arbitrary code on your server, like:

require("child_process").exec(/* steal API tokens, SSH keys, secrets/passwords ...etc */);

If you’re interested how, this article goes deeper into how this gets exploited.

There are a few ways we can mitigate and defend against this type of bug in dynamic languages. Let's examine them...

Potential Solutions

Your instinct might be to solve this by just checking for Array object types in the if statement and you would be right, but let's examine a few other options.

  • You may consider disallowing non-string types. In our scenario, this might look like:
function sanitise(input) {
  if (typeof input !== 'string') {
    throw new Error('TypeError: sanitise input is not a string.')
  }
    ...
}
  • You may consider normalizing all data types. In our scenario, this might look like:
function sanitize(input) {
  if (input.isArray) { ... }
  if (typeof input === 'number') { ... }
  if (typeof input === 'string') { ... }
  // ... so on
}
  • You may consider a hybrid approach where you would disallow some types and handle some other types. This is arguably the most fragile approach.

Lessons Learned

  • The core of this bug is improperly handling/sanitising user input.
  • In dynamic languages, like Javascript, we declare functions without specifying data types of variables, which means the onus is on us, software developers & engineers, to make sure we are handling the right types of data at all times to prevent edge cases like these.
  • Unit testing and peer reviews are critical.
  • Statically-typed languages mitigate against these types of bugs, to some degree, at compile time, but they're still vulnerable to a variation of this bug, often referred to as deserialization vulnerabilities.

Links

Oldest comments (0)