As data becomes a focal parameter to consider before developing your solution. One must ask questions like
- When and why do we use a spreadsheet
- When and why do we shift to a database over a spreadsheet.
This blog will help you answer these questions.
Q. What is data?
Data is raw information. For example, your daily consumption of coffee. It is raw information about the amount of coffee you have consumed, but if you analyse it and gain insights from it. A few example would be,
- Types of coffee beans or coffee flavour
- How much sugar do you put into the coffee
Now that we have differentiated between information and data.
There are many formats to store and transfer data, these formats depend on the type of data. For example, one might write coffee ingredients directly on a piece of paper i.e unstructured or write it in a .csv file i.e structured or a combination of both i.e semi-structured.
When to use a spreadsheet and when to shift?
So when deciding if a spreadsheet still makes sense, the first thing to talk about is the structure of the data. Are we dealing with a simple list? Are there nested values? So we use spreadsheets for flat data or structured or data represented using rows and columns.
Another parameter is the size of the data.
Imagine having to pass that giant spreadsheet to your colleagues and multiple people are gonna have to be working on it, maybe at the same time, then you start to have a lot more trouble. The set of following questions are going to arise
- Who is going to own the data?
- How do you make sure that there are not two edits going through at the same time? And maybe you have to merge them at the end or account for discrepancies between multiple copies.
Maybe your company works with a solution that supports multi-tenancy, a big shout out to Google Cloud or Office 365 here for supporting multi-tenancy. But managing conflicts, managing traceability of multiple people managing the data starts to grow out of control. When you introduce different permission sets to edit different fields or different sections of the spreadsheet, that's when, in my opinion, spreadsheets fall over the fastest.
When thinking about if a spreadsheet makes sense, you need to think of how the data is gonna be used. Spreadsheets are at their strongest when you can see a lot of columns all at once to get a good feel for the data.
When to shift away from spreadsheet
When multiple people in multiple systems are accessing and editing the same data, the integrity of the data becomes questionable. And if you're in a company that handles health care or financial information in particular, who added it and why is very important for when an auditor knocks on your door. So potentially understanding who is editing the data, what is that in the data, where did data come from, that is known as data integrity on a more complex level that's known as data lineage.
Databases vary much than that spreadsheets enable you to monitor, trace, control, and manage your data quality and integrity, especially because many of the databases come with built-in audit logs on a per-field basis. Typically what a database supports are complex reporting requirements.
Whereas a spreadsheet is fantastic at simple one-off reports or maybe manually updated, databases are when you need data that is updated nightly, hourly, secondly, or maybe it is updated by multiple processes owned by many people.
But just so you know, and to reiterate, the rule of thumb is if you need more complex reporting, go with the database over a spreadsheet. To learn more about Databases and storage, refer to one of my previous blogs: