*knowledge dependencies for understanding this article include beginner's SQL(including ON_DELETE CASCADE) and beginner's database design
The first thing I want to do is acknowledge that the inspiration and one of the resources for this article was a conversation I had with Don Omondi. He's a wonderful teacher.
Ok, so what is the GDPR? It's a privacy and security law written by the EU, which affects any company that targets or collects data related to people in the EU. The effect it had (and still has as cookies evolve) on cookies is extensive. It also affected database design.
One of the rights of EU citizens according the GDPR is the right to erasure. This means that, under a certain set of circumstances, EU citizens have the right to have their personal data erased. It's also known as the 'right to be forgotten.'
What impact does this have on database design? Say you're running a site where users can repost other user's posts. User Sally has written a post that user Jenna has reposted.
Now, Sally wants to delete her account on your site.
If you've set up your tables such that ON_DELETE CASCADE deletes Sally's data and all the associated data, you've also deleted the content of Jenna's repost.
What are you going to do?
Enter the soft delete.
Soft deletes have a lot of advantages, and many websites use them for many reasons.
Soft deletes allow you to un-delete easily:
"Without the soft delete in place, a delete() call on an object will delete the record from the table using a DELETE statement. With the soft delete in place, an UPDATE statement is sent instead (that sets the deletedAt field to the current time)." source
So now, you have to decide what deleting Sally's account means now that you're using a soft delete. If you've implemented a soft delete to prevent things like Jenna's content on her repost disappearing, you might call it a 'deactivation' instead. Sally can come back and retrieve all her data at any time by reactivating her account, and Jenna's repost content can stick around.
Then the GDPR comes out.
If your database is set up like this, you can't easily comply with the right to be forgotten. How are you going to solve this problem?
You might license the content that users write under the Creative Commons Attribution ShareAlike license. That way you can save content, but the user can effectively delete their personal info. You remove their info from your database and anonymize their content. Stack Overflow has taken this route. For example of what this looks like, take this user on Stack Overflow:
The GDPR is awesome for a lot of reasons-- it's introduced protection for the data of EU citizens, and along with this protection has come exciting new problems to solve. I'm sure there are myriad ways to protect the rights of users under the GDPR. I'd love to hear other solutions in the comments. π€
Top comments (7)
To maintain referential integrity in your database en prevent cascaded deletes if a user claims the GDPR right to be forgotten, you can anonymize the userβs account: overwrite anything that can trace back the account to that specific person: hustle personal info, username, ip addresses, etc. This way, you can retain the records in your database (for whatever reason) and comply with the GDPR.
Yes, I think that's how StackOverflow is using their license on the content.
Although, I wonder how this type of thing would work for content like selfies-- seems like anonymizing usernames wouldn't cut it in that scenario.
I'm interested to know as well. Let's say we apply this to ecommerce, and the user wants to be forgotten then what? You cannot delete order for sure.
One option is to decouple personal information with non personal information. In this case, it will become like your example I guess. We can see that the order was placed by user xyz but will see no personal information associated with that user. It gets trickier with information that you cannot delete, for example ip address, shipping and payment information. You obviously need these information for various things including future dispute if any.
Good thought! Would that fall under the exception in Article 17a, where the data is still needed in relation to the purpose for which it was collected? gdpr-info.eu/art-17-gdpr/
This discussion goes beyond the database. Deleting data from disk often just marks it as being unused, but it is still possible to recover the old data unless you overwrite the data with zeros. And even then, some tools may be able to recover data, I believe based on wear patterns on the hard drive,, which is why many drive formatting tools overwrite the whole drive with zeros many times.
I'm always curious where, according to the law, the line for "good enough" lies.
Yes, I wonder about this too. The GDPR has a list of defined terms but I can't find any related to deletion. gdpr-info.eu/art-4-gdpr/