Is SQL a declarative language

#sql #opensource #devops #esproc

When learning SQL, we often hear the saying that SQL is a declarative language. You just need to tell it what to do, without telling it how to do it, it will find its own implementation method. That is to say, you need to use it only to describe the task objectives without explaining the computational process, which is fundamentally different from conventional procedural languages. Obviously, this programming language sounds much easier to learn and use.

Is it really that good?

Let’s take an example. We use SQL to query the number of female employees in the sales department. This is the written SQL:

It seems like this, we don’t need to care about the specific calculation process (traversing every record in the employee table, adding 1 to the count if it meets the conditions, skipping if it doesn’t, and finally looking at the count), just state the target to be queried.

Let’s take another example, find the average salary of employees aged above 30 by department:

It looks good too, here we really don’t have to care about how to group and calculate the average.

Although SQL is still a language with strict syntax, we can only write correct statements after a certain amount of learning. However, if we don’t care about the calculation process, it will still save a lot of efforts.

Let’s take another example: identify the major customers who contribute the top half of sales. If we design the calculation process, it would be like this:

Calculate the total sales revenue of all customers.
Sort customers in reverse order of sales revenue, with the large ones ranking first and the small ones ranking last.
Accumulate the sales revenue of each customer from 0 in this order, and stop when it exceeds half of the total sales revenue. Then the customers that have been traversed are the target customers.

Then, what does it look like to write it in SQL? This is the simplest way I can think of:

Take a closer look at this SQL statement, which basically describes the above process. There is a sub query that calculates total sales, followed by a sub query that sorts sales in reverse. The window function is used to calculate the cumulative sales of each row in the sorted list. The main query then filters out customers whose cumulative sales are less than half of the total sales. The only difference from the above process is the writing order. SQL starts calculating the total sales revenue later. There is also a slight logical difference, as SQL’s ordered calculations and step-by-step calculations are not good, it is necessary to calculate all accumulated sales before finding the top ones.

This SQL statement is accurately describing such a process. What about describing the task objective without worrying about the computational process?

Let’s take a simpler example: find the top 10 customers with the highest sales revenue.

Some SQL statements are written as follows:

If using a well-known database, you also need to use subqueries:

Both of these SQL statements clearly tell us the calculation process: after sorting by sales revenue in reverse, get the top 10. Moreover, in the syntax of this famous database, it is necessary to artificially create a serial number, which means that the database needs to be more clearly told how to calculate.

If we look at a few hundred rows of SQL (stored procedures), we can see more clearly that SQL is still describing the calculation process honestly, and different calculation processes can bring vastly different computational performance and even results.

In fact, any programming language can be said to be a declarative language to some extent: that is, it only needs to care about the goal and not the process.

If you write a program in Java, you only need to care about how variables change, without worrying about the actions of registers in the CPU, but in assembly language, you need to care; Similarly, when using assembly language, you need to care about the values of registers, but you don’t have to worry about how the NAND gate operates in the CPU;

When writing code in SQL, there is generally no need to worry about the specific actions of variables and loops. It does not have the concept of variables, but it has the concepts of tables, fields, and related calculation methods, and you also need to pay attention to the process at this level. In this sense, SQL and other programming languages only have different levels of abstraction in describing problems, and there is no essential difference in explaining processes.

Why do so many people think that SQL is a so-called declarative language? This is because the design of SQL deliberately weakens the “procedural” feature, and in order to make the statement more like English, it puts basic operations into each clause of a statement. When all the steps involved in the calculation task are basic operations within the abstract level of SQL, it can be written as one statement, seemingly describing the task goal to SQL.

However, conventional programming languages usually do not design multiple basic operations into clauses or function parameters of one statement, and advocate for programmers to combine them. This way, people will feel the need to describe the process and do not have the characteristic of “being declarative”.

However, the clause structure of a statement is limited even if it is complex. When the task is beyond the scope of this structure and requires the use of nested subqueries or intermediate results to describe it, the so-called “declarative” illusion of SQL will be exposed. It is still necessary to honestly describe the process and approach. Looking back at the previous example, it is clear to see this point.

SQL is indeed easier to learn and use compared to high-level languages like Java when facing some basic query tasks, but it is not because it has stronger “declarative” features than Java, but because it has a higher level of abstraction in structured data computation than Java.

SQL deliberately weakens the “procedural” characteristics, such as the absence of intermediate variables, which makes its ability to describe processes very weak. When faced with complex tasks, the difficulty increases sharply, and even some tasks cannot be implemented using SQL alone, because SQL without procedural capabilities is not Turing complete. So, database vendors later have to supplement and invent stored procedures and CTE syntax.

If we invent a language that has a high level of abstraction for data computation while retaining its procedural features, you will find it easier to learn and use than SQL, especially when facing complex business logic. Well, this is esProc SPL.