So ever since starting to run the surveys, I've hoped that I would never have to write the dreaded "data leak" post. But sadly today is the day I need to address this issue.
An encryption key that makes it possible to decrypt publicly-available encrypted email addresses and link them to survey responses was mistakenly committed to a public GitHub repo.
- This is a human error, not a malicious attack.
- The leak is now closed.
- You are concerned if you answered the State of JS or CSS surveys before and up to 2020 (the 2021 JS and CSS surveys are not affected).
- So far there is no evidence that the mistake was actually exploited, but I'll keep monitoring the situation.
- Passwords were not affected as they use a completely separate hashing mechanism.
This situation resulted from three separate mistakes:
- I made the decision two years ago to add email hashes (or so I thought) to publicly available survey responses datasets (for surveys up until 2020; 2021 datasets were not published yet) in order to use it as an ID and make it possible to track how a given respondent's answers were evolving over time.
- An open-source contributor contributed the function that generate those "hashes" and used a 2-way encryption function. Somehow over time I made the assumption that it was instead a 1-way hashing function.
- About a month ago, another open-source contributor committed private credentials -which included the encryption function's encryption key– to a public repo while working on a separate project. Although the contributor noticed the issue and scrubbed the history right away, the faulty commit apparently stayed accessible by itself as a "ghost commit" outside of a branch.
Both because of the holidays, and because I didn't realize the consequences of the leak right away, the encryption key stayed accessible in theory for about a month.
The risks to survey respondents are two-fold:
- Someone could use the dataset to generate an email list used for spamming purposes.
- Someone could link personal data (salary, etc.) to the email address you used.
The "good" news is that the repo the key was committed to is very low traffic and had no forks, watchers, or stars, making it less likely that ill-intentioned people randomly stumbled on the encryption key.
Moreover, even with the key in hand an attacker would've had to then figure out where the key was being used (which happens in a separate repo); what it was being used for; and where the relevant encrypted emails were made available; none of which is obvious unless one is already familiar with the project.
So while I don't have any way to tell with certainty if anybody actually went through the process of decrypting the encrypted emails and correlating responses with them, I personally think the probability of this happening is fairly low. But I apologize for not being able to give you more certainty.
I've taken the following steps:
- Stop using the leaked encryption key.
- Make the repo private so that the encryption key is not accessible anymore.
- Take down the public datasets containing the encrypted emails until I can re-upload versions without them.
Note: if you happen to have a copy of the datasets or are hosting a mirror, please get in touch or delete your copies if you can!
In the future, I will also focus on making it possible to complete the survey without having to provide an email, which is something that survey respondents have often asked for.
Ironically enough, the leak happened in the process of migrating the survey app to a newer, more robust codebase in order to make it easier to change the way accounts work.
The surveys are an open-source project, created in the open by a mostly-volunteer group of contributors from around the world. And while this can sometimes make it tougher to properly coordinate and avoid situations like this one, I also think being community-driven is one of the project's major strengths.
So while it's totally understandable if a leak like this one makes you question sharing any data with us in the future, I hope you'll be able to give the project another chance.
And if you're not fully comfortable sharing personal information just yet, here's a reminder that you can always skip any question in any survey. Another thing that might put you more at ease might be to use an email alias that can't easily be tied back to you.
I deeply apologize again, and if you have any questions about this whole thing, just leave a comment here and I'll do my best to answer.