It is a very common task to compare data values. For example, I was involved in project where we upgraded the scoring engine. We wanted to make sure the old and new scoring engines produce the same outputs given the same inputs. I use the following table to illustrate the problem. We want to make sure value_old and value_new are the same. (The blanks are NULL values.)
ID VALUE_OLD VALUE_NEW
---------- ---------- ----------
1 234
2 567
3 789 789
If we simply use the following query to count the number of discrepancies, the result will return zero. This is not what we expect.
select count(*) from tbl_data_a where VALUE_OLD < > VALUE_NEW;
COUNT(*)
----------
0
This is because rows with NULL values appearing in the comparison are ignored.
A better approach is to write a query considering all of the following five situations:
In the following cases, VALUE_OLD and VALUE_NEW are the same.
1. VALUE_OLD is null, and VALUE_NEW is null.
2.VALUE_OLD is not null, VALUE_NEW is not null and VALUE_OLD=VALUE_NEW.
In the following cases, VALUE_OLD and VALUE_NEW are the different.
3.VALUE_OLD is null, and VALUE_NEW is not null.
4.VALUE_OLD is not null, and VALUE_NEW is null.
5.VALUE_OLD is not null, VALUE_NEW is not null and VALUE_OLD < > VALUE_NEW.
Popular Topics
Popular Topics
Thursday, March 07, 2013
Monday, March 04, 2013
Custom model score vs credit bureau score
In previous post, we compare the performance of different types of the models applied to the same data set. In reality, a more frequently encountered issue is to compare the performance of credit bureau scores and custom model scores. By custom model scores, we mean the model that is built based on client's own historical data. I have done numerous predictive modeling projects in the area across industries. My conclusion is that custom model scores are almost always better than generic credit bureau scores by large margins. The following gain charts are from a real project. For example, if we reject worst 20% customers based on their bureau scores, we can stop 26% of the loan default. If we reject 20% customers based on custom model scores, almost 40% of the loan default can be stopped. The can easily translate into big savings for a company with large number of customers.
The patterns are common. Thus, it is worthwhile to build a custom model that will almost always outperform generic credit bureau scores as long as the client has enough historical data.
The patterns are common. Thus, it is worthwhile to build a custom model that will almost always outperform generic credit bureau scores as long as the client has enough historical data.
Subscribe to:
Posts (Atom)