Deep Data Mining Blog: March 2013

Thursday, March 07, 2013

Watch out NULL values when comparing data

It is a very common task to compare data values. For example, I was involved in project where we upgraded the scoring engine. We wanted to make sure the old and new scoring engines produce the same outputs given the same inputs. I use the following table to illustrate the problem. We want to make sure value_old and value_new are the same. (The blanks are NULL values.)

ID VALUE_OLD VALUE_NEW
---------- ---------- ----------
1 234
2 567
3 789 789

If we simply use the following query to count the number of discrepancies, the result will return zero. This is not what we expect.
select count(*) from tbl_data_a where VALUE_OLD < > VALUE_NEW;

COUNT(*)
----------
0
This is because rows with NULL values appearing in the comparison are ignored.

A better approach is to write a query considering all of the following five situations:
In the following cases, VALUE_OLD and VALUE_NEW are the same.
1. VALUE_OLD is null, and VALUE_NEW is null.
2.VALUE_OLD is not null, VALUE_NEW is not null and VALUE_OLD=VALUE_NEW.

In the following cases, VALUE_OLD and VALUE_NEW are the different.
3.VALUE_OLD is null, and VALUE_NEW is not null.
4.VALUE_OLD is not null, and VALUE_NEW is null.
5.VALUE_OLD is not null, VALUE_NEW is not null and VALUE_OLD < > VALUE_NEW.

Monday, March 04, 2013

Custom model score vs credit bureau score

In previous post, we compare the performance of different types of the models applied to the same data set. In reality, a more frequently encountered issue is to compare the performance of credit bureau scores and custom model scores. By custom model scores, we mean the model that is built based on client's own historical data. I have done numerous predictive modeling projects in the area across industries. My conclusion is that custom model scores are almost always better than generic credit bureau scores by large margins. The following gain charts are from a real project. For example, if we reject worst 20% customers based on their bureau scores, we can stop 26% of the loan default. If we reject 20% customers based on custom model scores, almost 40% of the loan default can be stopped. The can easily translate into big savings for a company with large number of customers.

The patterns are common. Thus, it is worthwhile to build a custom model that will almost always outperform generic credit bureau scores as long as the client has enough historical data.

10 Most Influential People	Text Files and Oracle DB	Predictive Model vs Rule	Build Predictive Model	About Predictive Model Variable	Logistic Regression
Recency Frequency Monetary Analysis	Unique Identifier in Oracle	Materialized View	Database Link	Calculate Percentage Using SQL	Handle NULL Value
Calculate Cumulative Perentage	Find Score Cutoff Value	Remove Duplicates	Calculate Correlation Coefficients	Oracle vs SQL Server	Random Sampling
Table Insert	Read Only Table	Clustering	Ranking	Find Most Frequent	Median Value
Oracle Source Code	Debug PL/SQL	Hide PL/SQL Scripts	Repair Views	Dump Schema	Move Big Files to Amazon

Popular Topics

Popular Topics

Thursday, March 07, 2013

Watch out NULL values when comparing data

Monday, March 04, 2013

Custom model score vs credit bureau score