DEV Community

Patrick Wendo
Patrick Wendo

Posted on

Databases, Relation Algebra and Set Theory.

This will be a post covering the common ground between database systems, set theory and relational algebra. It is intended as a basic refresher or an intro depending on the reader. I do assume some basic set theory notation knowledge on the part of the reader. Questions and clarifications are welcome.

Terminologies used

  1. Set : These are collections that are completely characterized by their elements. Two sets are equal only if they contain the same elements.

    A=[1,2,3]B=[3,2,1]A = [1,2,3] B = [3,2,1]
    The sets A and B above are equal since they contain the same elements.
  2. Relation (AKA table or entity set) : Given sets D1,D2,...,DnD_1, D_2, ..., D_n A relation is a subset of D1×D2×...×DnD_1 \times D_2 \times ... \times D_n . It is set of n-tuples (a1,a2,...,an) (a_1, a_2, ..., a_n) where aiDia_i \in D_i
    Each relation has a fixed number of columns that are explicitly names where each attribute name within a relation is unique. Rows/tuples in a relation have no ordering associated with them. A database has multiple relations.

  3. Tuple (AKA row or entity instance): It is an element in a relation.

  4. Attribute (AKA column): It describes a tuple in a relation.

  5. Query Language : It is a language in which a user requests information from a database.

Relation, Attribute and Tuple


We will cover the 6 basic operators used in relational algebra. These operators can later be used in compositions to form higher order operations.

  1. Select ( σ\sigma )
  2. Project ( Π\Pi )
  3. Union ( \cup )
  4. Set Difference ( - )
  5. Cartesian Product ( ×\times )
  6. Rename ( ρ\rho )

These operations take one or two relations as inputs and produce a new relation as a result. This is a property known as closure.

Select ( σ\sigma )


in this example p is called the selection predicate. R is the relation upon which we perform the select operation. This operation will always return a subset of the initial relation. (remember that a set is also a subset of itself, ( RRR \in R ))

Formally, select is defined as

σp(R)=ttR and p(t)\sigma_p(R) = { t | t \in R \ and \ p(t) }

Relational Algebra: σfirstName="John"(Employees)\sigma_{firstName="John"}({Employees})


Will return all employees who's first name is John

Project ( Π\Pi )

ΠA1,A2,...,Ak(R)\Pi_{A_1, A_2,...,A_k}(R)

A1,A2,...,AkA_1, A_2,...,A_k are attribute names and R is the relation.
The result is defined as the relation of K columns obtained by erasing the columns that are not listed. Duplicate rows are removed since relations are sets.

Relational Algebra: ΠfirstName,lastName(Employees)\Pi_{firstName, lastName}({Employees})


Will return all employees, but only with their firstname and lastname

Union Operator

RS=ttR or tS{R} \cup {S} = t | t \in R \ or \ t \in S

This requires two relations R and S and returns a relation in which every element is either a member of R or S.

For RS{R} \cup {S} to be valid the following conditions have to be met.
(i) R and S must have the same [arity] number of columns)
(ii) The attribute domains must be the same. That is, the corresponding columns must have the same data type.

If we have two relations, Sciences and Humanities and we want to find the courses offered in both we could run this query
Relational Algebra:

ΠcourseName(Sciences)ΠcourseName(Humanities)\Pi_{courseName}(Sciences) \cup \Pi_{courseName}(Humanities)

SQL : SELECT courseName FROM Sciences UNION SELECT courseName FROM Humanities;
It will return a relation with only the course names from both Sciences and Humanities

Set Difference

RS=ttR and tSR - S = t | t \in R\ and\ t \notin S

This takes in two relations R and S and will return all the rows in R but not in S.

Set Difference illustration
The shaded areas represent RS and SRR - S\ and \ S - R respectively

ifA=[1,2,3] B=[2,4,5] then AB=[1,3] if A = [ 1, 2, 3] \ B = [2, 4, 5]\ then \ A - B = [1, 3]

Similar to Unions, set differences must be taken between relations with the same arity.

Cartesian Product

R×S=tqtR and qSR \times S = tq | t \in R\ and\ q \in S

In this situation we have to assume that the attributes of R and S are disjoint, that is RS=R \cap S = \emptyset . If they are not disjoint then renaming must be used.

The cartesian product will concatenate each row of R with each row of S. The resultant relation will have #rows_in_R x #rows_in_S. (If R has 2 rows, and S has 3 rows then the result will have 3x2=6 rows.)

Cartesian Product illustration

Rename Operator

It allows us to rename and therefore refer to the result of a relational algebra expression.


This will return the expression E under the name X.
Similarly if an expression E has an arity n, then
ρA1,A2,...,An(E)\rho_{A_1, A_2,...,A_n}(E)

And this will return the result of expression E under the name X with the attributes renamed to A1,A2,...,AnA_1, A_2,...,A_n

Note There are additional operators that do not add to the power of relational algebra, but help to simplify queries. These operators are compositions of the 6 basic operators. They are:

  1. Set Intersection
  2. Natural Join
  3. Division
  4. Assignment. I will explain these in a later post and link it here once I do so.

That's all folks

I'm trying to sharpen my database skills and as such, I figured I will start from the basics of the basics. More to come. Stay tuned.

Discussion (0)