Thursday, March 07, 2013

Watch out NULL values when comparing data

It is a very common task to compare data values. For example, I was involved in project where we  upgraded the scoring engine. We wanted to make sure the old and new scoring engines produce the same outputs given the same inputs. I use the following table to illustrate the problem. We want to make sure value_old and value_new are the same. (The blanks are NULL values.)

        ID  VALUE_OLD  VALUE_NEW
---------- ---------- ----------
         1        234
         2                   567
         3        789        789

If we simply use the following query to count the number of discrepancies, the result will return zero. This is not what we expect.
select count(*) from tbl_data_a where VALUE_OLD < > VALUE_NEW;

  COUNT(*)
----------
         0
This is because rows with NULL values appearing in the comparison are ignored.

A better approach is to write a query considering all of the following five situations:
In the following cases, VALUE_OLD and VALUE_NEW are the same.
1. VALUE_OLD is null, and VALUE_NEW is null.
2.VALUE_OLD is not null, VALUE_NEW is not null and VALUE_OLD=VALUE_NEW.

In the following cases, VALUE_OLD and VALUE_NEW are the different.

3.VALUE_OLD is null, and VALUE_NEW is not null.
4.VALUE_OLD is not null, and VALUE_NEW is null.
5.VALUE_OLD is not null, VALUE_NEW is not null and VALUE_OLD < > VALUE_NEW.

No comments: