Hey there! How's your day going so far? Hope you are doing great!
For the third post of our OWASP API Security Top 10 series, it is time to talk about Excessive Data Exposure! Hope you guys like it 🤗
🏞️ What is Sensitive Data?
Excessive data exposure is an issue that includes different intrinsic topics, such as database performance for example. However, this whole series is about API security, and when it comes to the security aspect of excessive data exposure, sensitive data always comes up.
Okay, so what the flip is sensitive data? Generically speaking, we may consider as sensitive data everything that would cause damage to a company if publicly exposed. Some examples are listed below:
- SSH keys
- Database credentials
- Users' billing information
- Biometric and health records
Of course, not all of these things will appear as a result of calling API endpoints. The most common cases of excessive data exposure in APIs usually involve a more specific type of sensitive data, that we may call PII (Personally Identifiable Information).
Personally Identifiable Information is any data that, once publicly available, could be used directly or indirectly to identify a person. From the examples listed above, we can consider the last two items as being PII.
⚙️ Getting Sensitive Data From APIs
Now, I am going to be showing two different cases where we have API3:2019 happening, one for GraphQL APIs and one for REST APIs.
📚 GraphQL Example
As a first practical example, let's take a GraphQL API of a fictional game, from where the introspection defines the following query:
query {
user(id: 123) {
username
level
rankingPosition
}
}
Basically, it receives a number which corresponds to the ID of a user, and returns their username, level and position on the ranking. Theoretically, if the API really follows what the introspection is saying, we are only capable of retrieving these three attributes of a user.
Also, the introspection defines this query right down below, from which is possible to retrieve some of your own user's information:
query {
myUser {
username
level
rankingPosition
address {
street
number
postCode
}
creditCard
}
}
Notice that the query myUser
returns more attributes than the previous one, it returns your address and your credit card. Returning your credit card in plain text, per se, is something that might already be enough to say that we have an excessive data exposure happening, but it can get worse.
What both queries do is to retrieve data from a specific user, and because of that, although they are differently defined by the GraphQL introspection, possibly both user
and myUser
were implemented in a similar way, with very similar code.
By assuming that they "share code" to each other, an attacker might try to call user
passing creditCard
as a response attribute, like this:
query {
user(id: 1337) {
creditCard
}
}
If it works, this means that the attacker is able to retrieve the credit card of any user they want. In other words, this means that our fictional game has a business logic error leading to excessive data exposure.
📚 REST Example
For the REST example, let's get back to those screens presented in the last post. We have this login page:
And when you submit your credentials, a new page is generated:
Let's say that the new page makes the following request, in order to obtain your name and display your "hello" message:
GET /users/welcome HTTP/1.1
Host: api.example.com
Authorization: Bearer sup3r_t0k3n_h3r3
And for the request above, this is the response:
HTTP/1.1 200 OK
Content-Type: application/json{
"firstName": "Naruto",
"lastName": "Uzumaki",
"username": "User1337",
"email": "dattebayo@mail.com",
"password": "HashedVersionOfRamen123"
}
Notice that the response not only has more data than necessary (it should ideally contain only the username), but it also contains the password of the user in it. One more case of API3:2019 😣
Cases like this happen when the endpoint implementation is something like:
user = db.run('SELECT * FROM users WHERE id = 1337')
return user
Basically, it just picks up everything related to the user and throws in the response, without filtering what is not important, and without filtering sensitive data. A similar implementation using an ORM would be just:
user = User::find(1337)
return user
You can, of course, implement a filter in the database itself, or elsewhere in the API code, but if it is not done, this kind of endpoint implementation where you rely exclusively on the client side to filter stuff may lead to huge problems.
📔 External materials
As my goal with this series is to just explain what each flaw is while I'm learning about them all, I would like to suggest some materials about data exposure issues, so you understand better the details of it:
https://github.com/OWASP/API-Security/blob/master/2019/en/src/0xa3-excessive-data-exposure.md
https://salt.security/blog/api3-2019-excessive-data-exposure
https://portswigger.net/support/using-burp-to-test-for-sensitive-data-exposure-issues
Top comments (0)