I am developing a "Google Alerts" for developers service GitSpo. I have not figured out exactly what it is, but it is growing fast and people are liking it. A big part of GitSpo is aggregating data from different social networks, such as Twitter, LinkedIn, and Stack Overflow. This is when I noticed something odd: Stack Overflow default user profiles are using Gravatar.
For those of you not familiar, Gravatar is a service that allows you to associate an image (an avatar) with your email. That image can then be used by other websites (e.g. Stack Overflow) to display an avatar for people signing up on their website. User's avatar is found by hashing their email, e.g. My email is gajus@gajus.com. Anyone who have my email can generate a Gravatar URL:
https://www.gravatar.com/avatar/74a5bd659b3a8af09a336a932eebe3b1
Which will load my avatar:
The service was launched in 2007 and grew rapidly at least in part due it being the default avatar for comments left on WordPress sites. It is a neat idea: upload avatar once and have it follow you around the Internet. Update your Gravatar, and your avatar updates across all websites. Unfortunately, the hashing algorithm they've chosen is not particularly safe.
Gravatar image is generated by MD5 hashing a trimmed, lower-case representation of your email, i.e. md5('gajus@gajus.com') === '74a5bd659b3a8af09a336a932eebe3b1'
. It is a fast hash. Using MD5 to hash private data was a bad choice even at the time. Today, there are MD5 databases that contain over 90 trillion hashes. Furthermore, as most emails contain only a narrow range of characters (/^[a-z@\-.]+$/
) and you can assume their ending (popular email domains like @gmail.com
), there are a lot permutations that need to be pre-hashed.
As an experiment, I picked hashes of 1000 Stack Overflow profiles and used one of the MD5 'decryption' services, which gave me 721 emails (a 72% success rate).
However, the interesting use case is not getting the emails. A lot of developer emails are already semi-public, e.g. GitHub user emails can be obtained from their public profile, commit logs, license files, or even comments in the code. As GitSpo has an index of all public GitHub users and repositories, I was able to extra the associated email addresses, hash them and match them to Stack Overflow. All 1000 of them.
It is worth noting that Stack Overflow is not the only service that is using Gravatar (WordPress, HootSuite, TechDirt, Disqus, just to name a few others). Stack Overflow simply stood out because it is a developer resource and it surprised me that this slipped through the cracks.
There is not much Stack Overflow can do about it today – Stack Overflow has many of their website copies floating around the Internet. However, it would be the best to stop relying on Gravatar as a service for new users that are joining the system.
Top comments (10)
The point I was trying to make in the article is not that your email is leaked, but that your identify is compromised. A lot of Stack Overflow users are anonymous (or think they are anonymous). The only visible information about them is the made up username. This article simply alerts them that their identify is hidden in their default avatar.
If one were to report this to Gravatar and it were fixable in some way, seems like any current emails are leaked for good, eh?
I wonder where we exist in a world where anyone can expect that their email won't be discoverable by anyone determined to find it.
Gravatar Problem actually. You might report -- en.gravatar.com/support/
Seems not to work with my email, even though it is quite guessable.
I guess they can not erase history, but going forward they could do the same thing we do at Discourse. You proxy the image and serve a copy.
I discussed many many years ago: meta.stackexchange.com/questions/2...
Surprised SO never got around to proxying these things.
I don't think there was really any expectation of privacy with gravatar. If it means the same avatar follows you around each site... I mean... even using the same handle is increasing the chance that pieces of what you say can be put together and expose your identity anyway
I don't use gravatar that way, I have a picture url on my db which downloads gravatar image and puts on your storage, and a worker which updates them
;-)))
🤯🤯🤯
I guess stack overflow would have this in their terms. I feel now the only way to fix it is to have throw away account
Interesting story thanks for sharing