NightBird07

Posted on

# Nulls are equal in distinct but inequal in unique.

While developing ETL pipelines for a new data warehouse, I encountered unexpected query results related to NULL values. This prompted me to delve into the underlying SQL behavior to better understand the discrepancy. In this article, I aim to elucidate the contrasting treatment of NULLs by the DISTINCT and UNIQUE keywords. I'll present examples based on my analysis of the data warehouse.

## DISTINCT: Treating Null as Equivalence

When I sought to explore the categories or ranges of values within a specific column—let's say the "jobs" column—the initial result appeared as follows:

``````[doctor, engineer, ..., null]
``````

This outcome raised a question: Does this mean that there is only a single NULL value? My previous understanding of PostgreSQL led me to believe that NULL values are treated as distinct. However, when I meticulously counted the occurrences of NULL values in the "jobs" column using the query:

``````SELECT COUNT(*) as nulljobs FROM jobs WHERE jobs IS NULL;
``````

The result unveiled a surprising 140 rows containing NULL values. It seemed that DISTINCT treated NULL values as equivalent. There could be a reasonable explanation for this behavior. The DISTINCT operation is typically employed for visualization or categorization purposes. Hence, treating NULLs as distinct might not make practical sense. However, this raises the question of why NULLs are treated as distinct values in the first place.

This treatment is rooted in the concept of Three-Valued Logic (3VL), which consists of true, false, and unknown. Crucially, "unknown" is not equal to "unknown" in this context. Keep in mind that the comparison "unknown != unknown" doesn't hold. To shed light on this, consider that infinity + infinity equals infinity. This implies that unknown values can be aggregated into a single value or category.

## UNIQUE Constraints: Ensuring Uniqueness Among Columns

The behavior of NULL values under the UNIQUE constraint in PostgreSQL introduces interesting nuances. UNIQUE constraints are particularly tailored for use with partial indexes. In this specific context, the interpretation of NULL values differs significantly. Rather than serving as indicators of mean or central tendencies, NULL values under UNIQUE constraints embody the distinctiveness of other non-null values within a row. As a result, UNIQUE constraints do not revolve around the treatment of NULL values in isolation. Instead, they focus on preserving distinctions among values contained within individual rows.

To illustrate this point further, let's consider a practical scenario involving a PostgreSQL database:

Suppose we have a table named "employees" with columns "employee_id" and "email." We want to ensure that each email address in the "email" column is unique. However, NULL values should not interfere with this uniqueness requirement.

``````CREATE TABLE employees (
employee_id serial PRIMARY KEY,
email VARCHAR(255) UNIQUE
);
-- Create a partial index to enforce uniqueness among non-null ids
CREATE UNIQUE INDEX unique_email
ON employees (email)
WHERE email IS NOT NULL;
``````

In this case, PostgreSQL will enforce the UNIQUE constraint on the "email" column, allowing multiple NULL values. This aligns with the concept that the UNIQUE constraint is chiefly concerned with maintaining the distinctiveness of non-null values.

For instance, the following insertions are valid:

``````-- Valid insertions
INSERT INTO employees (email) VALUES ('john@example.com');
INSERT INTO employees (email) VALUES ('mary@example.com');
INSERT INTO employees (email) VALUES (NULL);
``````

However, attempting to insert duplicate non-null values would result in a violation of the UNIQUE constraint:

``````-- Invalid insertion due to violation of UNIQUE constraint
INSERT INTO employees (email) VALUES ('john@example.com');
``````

when you SELECT some info using the partial index already made with EXPLAIN it is performing an index scan.

``````EXPLAIN SELECT * FROM employees WHERE email IS NOT NULL;
``````