5 Powerful Thoughts on SQL NULLs

USA IT JOBS
SQL NULL values

NULL BASICS:

  • An arithmetic operation involving a NULL returns NULL. For example, NULL minus NULL yields NULL, not zero.
  • A boolean comparison between two values involving a NULL returns neither true nor false, but unknown in SQL’s three-valued logic. For example, neither NULL equals NULL nor NULL not-equals NULL is true. Testing whether a value is NULL requires an expression such as IS NULL or IS NOT NULL.
  • An SQL query selects only values whose WHERE expression evaluates to true, and groups whose HAVING clause evaluates to true.
  • The aggregate COUNT(*) counts all NULL and non-NULL tuples; COUNT(attribute) counts all tuples whose attribute value is not NULL. Other SQL aggregate functions ignore NULL values in their computation.

Behavior of NULL in Functions:

MAX()

This is the obvious query:

SELECT MAX(a) FROM R

If R is empty, the query must return NULL, not empty.
If R is a one-row table holding NULL, Date and Darwen don’t specifically declare what MAX() should return if its argument consists of exclusively NULL values.
If R is a table holding NULL and non-NULL integers, NULLs are ignored, and MAX() returns the maximum integer.

a >= ALL()

This expression of the maximum seems consistent with mathematical logic, but fails completely in SQL:

SELECT DISTINCT a
 FROM R
 WHERE a >= ALL (SELECT * FROM R)

If R is empty, the query returns empty. The >= ALL test is vacuously true with an empty subquery , but there is no value of a to exploit the test.
If R holds a NULL value, the query returns empty, because the test a >= ALL(…) returns unknown (not false!) for any NULL or maximum non-NULL integer value of a if the subquery includes a NULL value.

EXCEPT

This expression is one derivation of maximum as computed in relational algebra: subtract all the non-maximum values from the table R, leaving the maximal ones:

(SELECT DISTINCT * FROM R)
 EXCEPT
 (SELECT R.a
 FROM R, R AS S
 WHERE R.a < S.a)

If R is empty, the query returns empty.
If R holds a NULL value, the query returns NULL, in addition to whatever maximal integer is present (if any). The lower subquery never includes NULL, so NULL is never subtracted from R.

NOT IN

This expression is another writing of maximum as computed in relational algebra: find values not in the non-maximum values of R:

SELECT DISTINCT *
 FROM R
 WHERE a NOT IN (SELECT R.a
 FROM R, R AS S
 WHERE R.a < S.a)

This writing turns out to be subtly different from the last one.
If R is empty, the query returns empty.
If R holds one integer, and at least one NULL value, the query returns NULL, in addition to whatever maximal integer is present (if any). In this case, the subquery is always empty; the one available integer is compared only to NULL values, so do not participate in the subquery’s result. NULL NOT IN (empty) is vacuously true as it is in mathematics , so NULL is selected as part of the result.
If R holds more than one integer, the query returns the maximal integer. In this case, the subquery includes at least one value. Now that the subquery is not empty, NULL NOT IN (nonempty result) evaluates to unknown (not false!), and is no longer selected as part of the result. As an aside, NULL NOT IN (nonempty result) returns unknown even if the nonempty result includes NULL.

EXCEPT NULL

Because it is somewhat awkward to have an expression for MAX return two rows whose values do not equal, the following expression adjusts the EXCEPT expression to exclude NULL from the answer:

(SELECT DISTINCT * FROM R)
 EXCEPT
 (SELECT R.a
 FROM R, R AS S
 WHERE R.a < S.a OR R.a IS NULL)

If R is empty, the query returns empty.
If R holds NULL, the query returns the maximal integer, or empty if R has no integers. EXCEPT will remove NULL from the result if NULL appears in the bottom subquery, even though NULL is not equal to NULL. [10] Similarly, DISTINCT, UNION, and INTERSECT always returns at most one NULL.

Advertisements

Author: Srini

Experienced software developer. Skills in Development, Coding, Testing and Debugging. Good Data analytic skills (Data Warehousing and BI). Also skills in Mainframe.