Judy

Posted on

# The Syntax of Discreteness - Using SQL as an Example

Discreteness refers to the feature that members are allowed to be broken off from a set to exist independently or to be computed; and that the discrete members can be reorganized to form a new set. This tells that the discreteness is a feature related to the sets. Without the concept of the set, it makes no sense to talk about the discreteness.

The concept of discreteness is simple. Almost all high-level languages that support the structures (i.e. objects) naturally support the feature. In Java, we can get members of an array to compute; or generate a new array with them to perform set-oriented operations (The problem is Java supplies a very poor class library for performing set-oriented operations).

Unfortunately, SQL has poor discreteness.

A SQL system has the concept of the record, but doesn’t define the explicit record data type. SQL treats a single record as a temporary table containing one record – which is a set consisting of one member. Besides, when SQL retrieves a record from a table (a set), it copies it to generate a new record, which has nothing to do with the record in the original table. The set (table) is an immutable object. This immutability character assists in producing correct and simple code while it deprives the language of discreteness.

The lack of discreteness leads to tedious code and low efficiency.

To calculate the age difference and salary difference between Michael and James, for example, SQL needs two lines of code:

SELECT (SELECT age FROM employee WHERE name=‘Michael’) - (SELECT age FROM employee WHERE name=‘James’) FROM dual

SELECT (SELECT salary FROM employee WHERE name=‘Michael’) - (SELECT salary FROM employee WHERE name=‘James’) FROM dual

Not only this goes against the spirit of simplicity but it causes repeated queries.

With good discreteness, the code can be like this:

a = employee.select@1(name=“Michael”)

b = employee.select@1(name=“James”)

agediff=a.age-b.age

salarydiff=a.salary-b.salary

The query result can exist independently without having to be attached to a set, and can be reused.

The immutability character requires that data be copied for each operation. For a read-only operation, it reduces efficiency because it is both time-consuming and space-consuming. It causes more trouble when data is modified.

One example is to offer an extra 5% of reward to salespeople whose performance is in the top 10%. A natural algorithm is to find the eligible salespeople to generate an intermediate set, over which the action of adding the 5% of reward is performed. Due to the lack of discreteness and the immutability character in SQL, a set formed by the eligible records and the original records are completely independent of one another; it’s no use to make any modification over an intermediate result set. There’s no choice but to write all actions into one statement where the eligible records are found from the original table for modification. Unfortunately, it’s not easy to phrase the top 10% criterion under the WHERE clause. Thus, complicated subquery is needed. This is just a simple example. In real-world environment, we can find a large number of much more complicated scenarios that can’t be handled even with the subqueries. Generally we’ll first compute the primary key values of the eligible records, and then use them to find these records in the original table and modify them. The code is tediously long and extremely inefficient.

A discreteness-based language, however, is able to deal with the computation using the above natural algorithm:

a=sales.sort@z(amount).to(sales.len()*0.1) // Get the records where the performance is in the top 10% to form a new set

a.run(amount=amount*1.05) // Add the 5% of reward to each record in the set

The two examples show that the absence of discreteness makes it more difficult to perform a step-by-step computation and that the immutability character worsens performance and increases space consumption. Yet the lack of discreteness has more consequences.

SQL isn’t a thoroughly set-oriented language. It can’t generate a new set using members of a set. It can’t help performing an aggregate operation over each of the subsets only to lose them after executing a group operation. SQL doesn’t have an intuitive referencing mechanism to describe the association relationship because of the lack of discreteness. The language doesn’t define a method to support the discrete records and a set of them. It uses the conventional way of foreign keys to express the association relationship, producing tedious, difficult-to-understand and low-performance code. The order-based computations are the embodiment of the set-orientation and discreteness. Without discreteness, it’s hard to describe order-based computations in SQL. Without the good set-oriented ability, it’s meaningless to discuss the order of the members of a set; without the support of discreteness, it’s impossible to separate each member of a set from its neighbors when the order-based computation is required.

To reform SQL (the relational algebra, more precisely) theoretically, we need to introduce the discreteness as well as maintaining the good set-oriented ability. A new-generation language needs to possess the merits of both SQL and Java to address the previously-mentioned SQL issues.