DEV Community

Cover image for Stack Overflow is leaking user emails
Gajus Kuizinas
Gajus Kuizinas

Posted on

Stack Overflow is leaking user emails

I am developing a "Google Alerts" for developers service GitSpo. I have not figured out exactly what it is, but it is growing fast and people are liking it. A big part of GitSpo is aggregating data from different social networks, such as Twitter, LinkedIn, and Stack Overflow. This is when I noticed something odd: Stack Overflow default user profiles are using Gravatar.

For those of you not familiar, Gravatar is a service that allows you to associate an image (an avatar) with your email. That image can then be used by other websites (e.g. Stack Overflow) to display an avatar for people signing up on their website. User's avatar is found by hashing their email, e.g. My email is gajus@gajus.com. Anyone who have my email can generate a Gravatar URL:

https://www.gravatar.com/avatar/74a5bd659b3a8af09a336a932eebe3b1

Which will load my avatar:

Gajus Kuizinas

The service was launched in 2007 and grew rapidly at least in part due it being the default avatar for comments left on WordPress sites. It is a neat idea: upload avatar once and have it follow you around the Internet. Update your Gravatar, and your avatar updates across all websites. Unfortunately, the hashing algorithm they've chosen is not particularly safe.

CMD5

Gravatar image is generated by MD5 hashing a trimmed, lower-case representation of your email, i.e. md5('gajus@gajus.com') === '74a5bd659b3a8af09a336a932eebe3b1'. It is a fast hash. Using MD5 to hash private data was a bad choice even at the time. Today, there are MD5 databases that contain over 90 trillion hashes. Furthermore, as most emails contain only a narrow range of characters (/^[a-z@\-.]+$/) and you can assume their ending (popular email domains like @gmail.com), there are a lot permutations that need to be pre-hashed.

As an experiment, I picked hashes of 1000 Stack Overflow profiles and used one of the MD5 'decryption' services, which gave me 721 emails (a 72% success rate).

However, the interesting use case is not getting the emails. A lot of developer emails are already semi-public, e.g. GitHub user emails can be obtained from their public profile, commit logs, license files, or even comments in the code. As GitSpo has an index of all public GitHub users and repositories, I was able to extra the associated email addresses, hash them and match them to Stack Overflow. All 1000 of them.

It is worth noting that Stack Overflow is not the only service that is using Gravatar (WordPress, HootSuite, TechDirt, Disqus, just to name a few others). Stack Overflow simply stood out because it is a developer resource and it surprised me that this slipped through the cracks.

There is not much Stack Overflow can do about it today – Stack Overflow has many of their website copies floating around the Internet. However, it would be the best to stop relying on Gravatar as a service for new users that are joining the system.

Top comments (10)

Collapse
 
gajus profile image
Gajus Kuizinas

The point I was trying to make in the article is not that your email is leaked, but that your identify is compromised. A lot of Stack Overflow users are anonymous (or think they are anonymous). The only visible information about them is the made up username. This article simply alerts them that their identify is hidden in their default avatar.

Collapse
 
ben profile image
Ben Halpern

If one were to report this to Gravatar and it were fixable in some way, seems like any current emails are leaked for good, eh?

I wonder where we exist in a world where anyone can expect that their email won't be discoverable by anyone determined to find it.

Collapse
 
patarapolw profile image
Pacharapol Withayasakpunt • Edited

Gravatar Problem actually. You might report -- en.gravatar.com/support/

MD5 databases that contain over 90 trillion hashes.

Seems not to work with my email, even though it is quite guessable.

Collapse
 
samsaffron profile image
Sam

There is not much Stack Overflow can do about it today

I guess they can not erase history, but going forward they could do the same thing we do at Discourse. You proxy the image and serve a copy.

I discussed many many years ago: meta.stackexchange.com/questions/2...

Surprised SO never got around to proxying these things.

Collapse
 
edv4rd0 profile image
Edward Williams

I don't think there was really any expectation of privacy with gravatar. If it means the same avatar follows you around each site... I mean... even using the same handle is increasing the chance that pieces of what you say can be put together and expose your identity anyway

Collapse
 
cyberhck profile image
Nishchal Gautam

I don't use gravatar that way, I have a picture url on my db which downloads gravatar image and puts on your storage, and a worker which updates them

Collapse
 
pavelloz profile image
Paweł Kowalski

;-)))

Collapse
 
ahmadawais profile image
Ahmad Awais ⚡️

🤯🤯🤯

Collapse
 
twitmyreview profile image
Priyab Dash

I guess stack overflow would have this in their terms. I feel now the only way to fix it is to have throw away account

Collapse
 
waylonwalker profile image
Waylon Walker

Interesting story thanks for sharing