DEV Community

Cover image for When and Why to migrate from spreadsheet to a Database
hridyesh bisht for AWS Community Builders

Posted on

When and Why to migrate from spreadsheet to a Database

As data becomes a focal parameter to consider before developing your solution. One must ask questions like

  1. When and why do we use a spreadsheet
  2. When and why do we shift to a database over a spreadsheet.

This blog will help you answer these questions.

Q. What is data?

Data is raw information. For example, your daily consumption of coffee. It is raw information about the amount of coffee you have consumed, but if you analyse it and gain insights from it. A few example would be,

  1. Types of coffee beans or coffee flavour
  2. How much sugar do you put into the coffee

Image credits: https://ciscocanada.files.wordpress.com
/2013/09/cisco_blog_canada_coffee.png

Now that we have differentiated between information and data.

There are many formats to store and transfer data, these formats depend on the type of data. For example, one might write coffee ingredients directly on a piece of paper i.e unstructured or write it in a .csv file i.e structured or a combination of both i.e semi-structured.

Image credits: http://programmerprodigycode.files.wordpress.com/2022/03/ecd9e-1sbcb7tf8jjwzchdtt_sodw.png

When to use a spreadsheet and when to shift?

So when deciding if a spreadsheet still makes sense, the first thing to talk about is the structure of the data. Are we dealing with a simple list? Are there nested values? So we use spreadsheets for flat data or structured or data represented using rows and columns.

Another parameter is the size of the data.

Imagine having to pass that giant spreadsheet to your colleagues and multiple people are gonna have to be working on it, maybe at the same time, then you start to have a lot more trouble. The set of following questions are going to arise

  1. Who is going to own the data?
  2. How do you make sure that there are not two edits going through at the same time? And maybe you have to merge them at the end or account for discrepancies between multiple copies.

Maybe your company works with a solution that supports multi-tenancy, a big shout out to Google Cloud or Office 365 here for supporting multi-tenancy. But managing conflicts, managing traceability of multiple people managing the data starts to grow out of control. When you introduce different permission sets to edit different fields or different sections of the spreadsheet, that's when, in my opinion, spreadsheets fall over the fastest.

Image credits: https://www.dynamicselect.co.uk/uploads/1/2/5/5/125589844/spreadsheetproblems_orig.png

When thinking about if a spreadsheet makes sense, you need to think of how the data is gonna be used. Spreadsheets are at their strongest when you can see a lot of columns all at once to get a good feel for the data.

When to shift away from spreadsheet

When multiple people in multiple systems are accessing and editing the same data, the integrity of the data becomes questionable. And if you're in a company that handles health care or financial information in particular, who added it and why is very important for when an auditor knocks on your door. So potentially understanding who is editing the data, what is that in the data, where did data come from, that is known as data integrity on a more complex level that's known as data lineage.

Databases vary much than that spreadsheets enable you to monitor, trace, control, and manage your data quality and integrity, especially because many of the databases come with built-in audit logs on a per-field basis. Typically what a database supports are complex reporting requirements.

Whereas a spreadsheet is fantastic at simple one-off reports or maybe manually updated, databases are when you need data that is updated nightly, hourly, secondly, or maybe it is updated by multiple processes owned by many people.

Image Credits: https://memegenerator.net/img/instances/47689283.jpg

But just so you know, and to reiterate, the rule of thumb is if you need more complex reporting, go with the database over a spreadsheet. To learn more about Databases and storage, refer to one of my previous blogs:

  1. https://programmerprodigy.code.blog/2021/03/02/introduction-to-cloud-databases/
  2. https://programmerprodigy.code.blog/2021/02/24/introduction-to-cloud-storage/
  3. https://programmerprodigy.code.blog/2021/06/28/intro-to-4-types-of-nosql-databases/
  4. https://programmerprodigy.code.blog/2020/12/16/intro-into-database-management/

Top comments (2)

Collapse
 
bradtaniguchi profile image
Brad

I remember one time I heard about a project that had nearly 100 different excel sheets talking to each other to create a system that handled 10s of millions rows to manage complex logistics. Not only did it work great for a few years, but it was done by a single madman person.

The issue came up that things started to slow down/glitch out and the original person who created it left the company, leaving this magical organization of sheets and voodoo to be migrated to a "real database" and app all the while data integrity was degrading by the scone.

We eventually re-built the whole thing as a real app and migrated over the data, but it always impressed me that a bunch of sheets were able to work that way, but it was also disturbing that much data semi-critical was being handled that way.

Collapse
 
hridyeshbisht profile image
hridyesh bisht

Exactly my point, displaying relationships in excel sheets can be a drag. Thank you for this real life use case, Brad.