DEV Community

Cover image for API3:2019 - Excessive Data Exposure
Breno Vitório
Breno Vitório

Posted on • Updated on

API3:2019 - Excessive Data Exposure

Hey there! How's your day going so far? Hope you are doing great!

For the third post of our OWASP API Security Top 10 series, it is time to talk about Excessive Data Exposure! Hope you guys like it 🤗

🏞️ What is Sensitive Data?

Excessive data exposure is an issue that includes different intrinsic topics, such as database performance for example. However, this whole series is about API security, and when it comes to the security aspect of excessive data exposure, sensitive data always comes up.

Okay, so what the flip is sensitive data? Generically speaking, we may consider as sensitive data everything that would cause damage to a company if publicly exposed. Some examples are listed below:

  • SSH keys
  • Database credentials
  • Users' billing information
  • Biometric and health records

Of course, not all of these things will appear as a result of calling API endpoints. The most common cases of excessive data exposure in APIs usually involve a more specific type of sensitive data, that we may call PII (Personally Identifiable Information).

Personally Identifiable Information is any data that, once publicly available, could be used directly or indirectly to identify a person. From the examples listed above, we can consider the last two items as being PII.

⚙️ Getting Sensitive Data From APIs

Now, I am going to be showing two different cases where we have API3:2019 happening, one for GraphQL APIs and one for REST APIs.

📚 GraphQL Example

As a first practical example, let's take a GraphQL API of a fictional game, from where the introspection defines the following query:

query {
    user(id: 123) {
        username
        level
        rankingPosition
    }
}

Basically, it receives a number which corresponds to the ID of a user, and returns their username, level and position on the ranking. Theoretically, if the API really follows what the introspection is saying, we are only capable of retrieving these three attributes of a user.

Also, the introspection defines this query right down below, from which is possible to retrieve some of your own user's information:

query {
    myUser {
        username
        level
        rankingPosition
        address {
            street
            number
            postCode
        }
        creditCard
    }
}

Notice that the query myUser returns more attributes than the previous one, it returns your address and your credit card. Returning your credit card in plain text, per se, is something that might already be enough to say that we have an excessive data exposure happening, but it can get worse.

What both queries do is to retrieve data from a specific user, and because of that, although they are differently defined by the GraphQL introspection, possibly both user and myUser were implemented in a similar way, with very similar code.

By assuming that they "share code" to each other, an attacker might try to call user passing creditCard as a response attribute, like this:

query {
    user(id: 1337) {
        creditCard
    }
}

If it works, this means that the attacker is able to retrieve the credit card of any user they want. In other words, this means that our fictional game has a business logic error leading to excessive data exposure.

📚 REST Example

For the REST example, let's get back to those screens presented in the last post. We have this login page:

Login page, asking for email and password

And when you submit your credentials, a new page is generated:

Page containing a hello message

Let's say that the new page makes the following request, in order to obtain your name and display your "hello" message:

GET /users/welcome HTTP/1.1
Host: api.example.com
Authorization: Bearer sup3r_t0k3n_h3r3

And for the request above, this is the response:

HTTP/1.1 200 OK
Content-Type: application/json

{
    "firstName": "Naruto",
    "lastName": "Uzumaki",
    "username": "User1337",
    "email": "dattebayo@mail.com",
    "password": "HashedVersionOfRamen123"
}

Notice that the response not only has more data than necessary (it should ideally contain only the username), but it also contains the password of the user in it. One more case of API3:2019 😣

Cases like this happen when the endpoint implementation is something like:

user = db.run('SELECT * FROM users WHERE id = 1337')

return user

Basically, it just picks up everything related to the user and throws in the response, without filtering what is not important, and without filtering sensitive data. A similar implementation using an ORM would be just:

user = User::find(1337)

return user

You can, of course, implement a filter in the database itself, or elsewhere in the API code, but if it is not done, this kind of endpoint implementation where you rely exclusively on the client side to filter stuff may lead to huge problems.

📔 External materials

As my goal with this series is to just explain what each flaw is while I'm learning about them all, I would like to suggest some materials about data exposure issues, so you understand better the details of it:

https://github.com/OWASP/API-Security/blob/master/2019/en/src/0xa3-excessive-data-exposure.md

https://salt.security/blog/api3-2019-excessive-data-exposure

https://portswigger.net/support/using-burp-to-test-for-sensitive-data-exposure-issues

Top comments (0)