## DEV Community is a community of 905,285 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

Patrick Wendo

Posted on

# Databases, Relation Algebra and Set Theory.

This will be a post covering the common ground between database systems, set theory and relational algebra. It is intended as a basic refresher or an intro depending on the reader. I do assume some basic set theory notation knowledge on the part of the reader. Questions and clarifications are welcome.

## Terminologies used

1. Set : These are collections that are completely characterized by their elements. Two sets are equal only if they contain the same elements.

$A = [1,2,3] B = [3,2,1]$
The sets A and B above are equal since they contain the same elements.
2. Relation (AKA table or entity set) : Given sets $D_1, D_2, ..., D_n$ A relation is a subset of $D_1 \times D_2 \times ... \times D_n$ . It is set of n-tuples $(a_1, a_2, ..., a_n)$ where $a_i \in D_i$
Each relation has a fixed number of columns that are explicitly names where each attribute name within a relation is unique. Rows/tuples in a relation have no ordering associated with them. A database has multiple relations.

3. Tuple (AKA row or entity instance): It is an element in a relation.

4. Attribute (AKA column): It describes a tuple in a relation.

5. Query Language : It is a language in which a user requests information from a database.

## Operators

We will cover the 6 basic operators used in relational algebra. These operators can later be used in compositions to form higher order operations.

1. Select ( $\sigma$ )
2. Project ( $\Pi$ )
3. Union ( $\cup$ )
4. Set Difference ( $-$ )
5. Cartesian Product ( $\times$ )
6. Rename ( $\rho$ )

These operations take one or two relations as inputs and produce a new relation as a result. This is a property known as closure.

### Select ( $\sigma$ )

$\sigma_p(R)$

in this example p is called the selection predicate. R is the relation upon which we perform the select operation. This operation will always return a subset of the initial relation. (remember that a set is also a subset of itself, ( $R \in R$ ))

Formally, select is defined as

$\sigma_p(R) = { t | t \in R \ and \ p(t) }$

Example
Relational Algebra: $\sigma_{firstName="John"}({Employees})$

SQL : SELECT * EMPLOYEES WHERE firstName='John';

Will return all employees who's first name is John

### Project ( $\Pi$ )

$\Pi_{A_1, A_2,...,A_k}(R)$

$A_1, A_2,...,A_k$ are attribute names and R is the relation.
The result is defined as the relation of K columns obtained by erasing the columns that are not listed. Duplicate rows are removed since relations are sets.

Example
Relational Algebra: $\Pi_{firstName, lastName}({Employees})$

SQL: SELECT firstName, lastName FROM EMPLOYEES;

Will return all employees, but only with their firstname and lastname

### Union Operator

${R} \cup {S} = t | t \in R \ or \ t \in S$

This requires two relations R and S and returns a relation in which every element is either a member of R or S.

For ${R} \cup {S}$ to be valid the following conditions have to be met.
(i) R and S must have the same [arity]https://en.wikipedia.org/wiki/Arity)(same number of columns)
(ii) The attribute domains must be the same. That is, the corresponding columns must have the same data type.

Example
If we have two relations, Sciences and Humanities and we want to find the courses offered in both we could run this query
Relational Algebra:

$\Pi_{courseName}(Sciences) \cup \Pi_{courseName}(Humanities)$

SQL : SELECT courseName FROM Sciences UNION SELECT courseName FROM Humanities;
It will return a relation with only the course names from both Sciences and Humanities

### Set Difference

$R - S = t | t \in R\ and\ t \notin S$

This takes in two relations R and S and will return all the rows in R but not in S.

The shaded areas represent $R - S\ and \ S - R$ respectively

$if A = [ 1, 2, 3] \ B = [2, 4, 5]\ then \ A - B = [1, 3]$

Similar to Unions, set differences must be taken between relations with the same arity.

### Cartesian Product

$R \times S = tq | t \in R\ and\ q \in S$

In this situation we have to assume that the attributes of R and S are disjoint, that is $R \cap S = \emptyset$ . If they are not disjoint then renaming must be used.

The cartesian product will concatenate each row of R with each row of S. The resultant relation will have #rows_in_R x #rows_in_S. (If R has 2 rows, and S has 3 rows then the result will have 3x2=6 rows.)

### Rename Operator

It allows us to rename and therefore refer to the result of a relational algebra expression.

$\rho_{x}(E)$

This will return the expression E under the name X.
Similarly if an expression E has an arity n, then
$\rho_{A_1, A_2,...,A_n}(E)$

And this will return the result of expression E under the name X with the attributes renamed to $A_1, A_2,...,A_n$

Note There are additional operators that do not add to the power of relational algebra, but help to simplify queries. These operators are compositions of the 6 basic operators. They are:

1. Set Intersection
2. Natural Join
3. Division
4. Assignment. I will explain these in a later post and link it here once I do so.

## That's all folks

I'm trying to sharpen my database skills and as such, I figured I will start from the basics of the basics. More to come. Stay tuned.
Cheers,
W3NDO