Why do We Need Databases and SQL?

SQL has a long and proven history. It has survived the fuss around NoSQL. And even if not perfect, it has demonstrated to be the best available language for data. This is no surprise! The story began in the 1960s with the development of databases—an era marked by the introduction of Integrated Data Store (IDS) at General Electric. However, it was Edgar Codd's relational model that revolutionized data handling. His model, which turned data into a series of tables (or more strictly, relations), has influenced database systems ever since. This era also saw the birth of SQL (Structured Query Language), which became the standard language for interacting with relational databases, including MariaDB and other.

The utility of relational database systems

So, why do we need all this database stuff? Let's imagine you're building an app, maybe a simple to-do list to keep track of your daily tasks. Initially, you might think, "Why not just save each task directly to a file?" After all, my programming language has constructs and libraries to save and read data from disk. Also, implementing this seems straightforward: create a task, write it to a file; delete a task, remove it from the file. These are good points, however as your app gains traction, users start to aggregate, and suddenly, you have thousands of users trying to add, delete, and modify tasks simultaneously. At this point, the simplicity of files becomes fragile. Imagine one user is updating a task at the exact moment another tries to delete it. Or maybe two users are editing the same task at the same time. With a simple file system, you're likely to end up with corrupted or lost data because there's no inherent mechanism to handle such conflicts.

Databases handle these situations gracefully through the ACID properties. Essentially, a set of principles ensures that even if your app crashes midway through an update, the data remains consistent and no half-completed tasks are left hanging. Back to the to-do app example, imagine trying to move your task "Buy groceries" from pending to completed which requires also changing the last_updated property, but your app crashes right in the middle. With a relational database, it's all or nothing—either the task is marked complete and the last_updated property reflects the new time value, or it's like you never tried to update it in the first place, avoiding those incorrect half-states.

Now, let's consider data relationships. In your app, tasks might belong to different categories or users. In a file system, maintaining these relationships is cumbersome. You might end up with a separate file for each category or user, but then how do you quickly find all tasks across categories or ensure two users don't end up with the same task ID? Databases have the ability to manage complex relationships, making it easy to query all tasks for a specific user or category, or even more complex queries like "show me the number of completed tasks for user U grouped by category C during the last month."

Security is another biggie. In a file system, if someone gains access to your files, they have your data. Databases offer robust security features, like access controls and encryption, safeguarding your data from unauthorized eyes.

And then there's the issue of growth. Your simple to-do app might evolve into a complex enterprise project management tool over time. With a file system, every change can feel like renovating a building with people still inside. Databases are built to be flexible and scalable, meaning they're designed to grow with your needs, whether you're adding new features or handling more users.

In the end, choosing a database over a simple file system is about preparing for success while standing on solid ground. It's about ensuring that as your app grows, your data remains secure, consistent, and manageable, and your users happy. After all, no one likes losing their to-do list to a random crash or waiting forever for their tasks to load because the system is bogged down handling conflicts and searches!

A bit of history

It was Edgar Codd who proposed the Relational Model for databases and, since he was a mathematician, formalized the concepts creating what is called Relational Algebra and Relational Calculus. All this was theoretical, until IBM and others started to implement the concepts in academic and research projects. They also wanted to come up with a standard language for querying data in relational databases. At first they invented QUEL (Querying Using the English Language) at the University of California, Berkeley. At IBM, researchers wanted to come up with their own language and started a project which I perceive more as a game between colleagues called SQUARE (Specifying Queries Using a Relational Environment). This led to a query language that had a scientific-like notation with subindexes and super-indexes which was hard to type on computer keyboards. To solve this, they redefined the language to only use standard characters and in an ingenious and probably friendly mockery way called it SEQUEL. This name however, was a trademark in the UK which prevented them from using it. They removed the vowels in SEQUEL and boom! SQL was born. By 1986, SQL would become an ISO and ANSI standard.

As a curious historical remark, although their inventors had to rename SEQUEL to SQL, they continued to call it "sequel". Even today many software developers and IT professionals continue to pronounce it "sequel". The name Structured Query Language (SQL) would appear later.

The utility of SQL

SQL is a declarative language, meaning that you specify what you want to get and not how to get it. The database is in charge of doing whatever needs to be done to get the data requested. SQL isolates database complexity. A database is a complex piece of software with tons of algorithms implemented in it. This algorithms deal with different ways to get data stored in disk or memory. Different algorithms are more efficient in different circumstances which includes different queries and different datasets.

For example, in MariaDB, a component called the query optimizer is in charge of deciding what algorithms to use given a SQL query and stats gathered on the actual data. The query optimizer analyzes the SQL query, the data structures, the database schema, and the statistical distribution of the data. It then decides whether to use an index, which joining algorithm is the best, and how to sequence the operations. This process involves a remarkable amount of complexity and mathematical precision, all of which the database abstractly manages for you. As a developer you only need to worry about constructing the query to get the data you need and let the database figure out whether to use or not an index (with some datasets, not using an index could be faster), B-trees, hash tables, and even whether to add the data to an in-memory cache, as well as many other things.

SQL also allows you to handle writes, that is, creating and updating data. It also allows you to define the schema of the database, or in short and over-simplifying, the tables and their column structure. In fact there's much more that SQL allows you to do and its functionality can be divided in four categories:

Data definition language (DDL): Creating and manipulating the schema.
Data manipulation language (DML): Inserting, updating, and deleting data from the database.
Data query language (DQL): Retrieving data from the database.
Data control language (DCL): Dealing with rights and permissions over the database and its objects.

In my more than 15 years of experience in the industry, I have rarely seen the previous categories used in a work environment, with the exception of DDL to refer to activities related to handling database schema updates. These categories are useful mostly in academic circles or in teams implementing relational database management software. However, it's good to know that these terms exist and are used by others as it helps in discussions around database technology. With this in mind, let me briefly touch on one of such discussions.

Some would say that developers have to deal only with DML and DQL while DDL and DCL are a concern of DBAs. In practice, this division is not so easy to make. Developers need to understand how database objects (like tables and columns) are created and how access to this objects is managed. However, it is true that developers spend most of their time writing SQL statements to modify and query data. You'll see that this book focuses on DML and DQL while explaining other categories as they are needed. On the other hand, DBA's are experts on everything database—from infrastructure and general database management to SQL query optimization and migration, a DBA is always a valuable brain to have in your team.

Conclusion

So in conclusion, databases solve real problems that application developers face, thanks to their ability to ensure data integrity through ACID properties, manage complex relationships, and provide robust security features. I only scratched the surface here, but this should be enough to give the novice IT practitioners a quick refresh on the importance of relational databases and SQL.