Thursday, November 22, 2012

A Real World Case Study: Business Rule vs Predictive Model

The following is a true story to complement earlier posts Comparison of Business Rules and Predictive Models and Predictive Modeling vs Intuitive Business Rules .

A few years ago, we built a new customer acquisition model for a cell phone service provider based on its historical application and payment data. The model calculated a risk score for each cell phone service applicant using information found in his/her credit reports. The higher the score, the higher the risk that a customer will not pay his/her bill.

A few weeks after the model was running, we received an angry email from the client company manager. In the email, the manager gave a list of applicants who had several bankruptcies. According to the manager, they should be high risk customers. However, our model gave them average risk scores. He questioned the validity of the model.

We mentioned that the model score was based on 20 or so variables, not bankruptcies alone. We also analyzed people with bankruptcies in the data that we used to build the model. We found that they paid bills on time. It might be that people with bankruptcies are more mobile and thus depend more on cell phones for communication. They may not be good candidates for mortgage. But from cell phone service providers' perspective, they are good customers.

This is the bottom line. Data-driven predictive models are more trustworthy than intuition-driven business rules.

Wednesday, November 07, 2012

From Oracle SQL Developer to Data Miner

As mentioned in an earlier post How easy is it for a SQL developer/analyst to become a data miner?, if we consider data mining as just more sophisticated SQL queries, data mining is simply a natural extension of SQL programmer/analyst's daily job, i.e., querying the database. For example, a SQL programmer can write a query to answer a business question: finding people living in Boston, MA with annual income at least $50K and credit score 750 or higher. One simply puts the three conditions in the "where clause" as the following:

select person_id, person_first_name, person_last_name
from tbl_data_about_people
where
home_city='Boston' and home_state='MA' and
annual_income>50000 and
credit_score >=750;


Similarly, we can put a predictive function in the "where clause" to answer a more useful question: finding people living in Boston, MA who are 25% more likely to purchase a new car with 2 months as the following:
select person_id, person_first_name, person_last_name
from tbl_data_about_people
where
home_city='Boston' and home_state='MA' and
prediction_probability(model_likely_to_buy, 1 using *)>=0.25;


We can see how easy and powerful to use predictive models in SQL. A traditional Oracle SQL developer can greatly enhance his/her value by picking up those data mining functions. There are a number of good documents and sample codes available at Oracle's site, e.g,
The Data Mining Sample Programs
Oracle® Data Mining Concepts

Tuesday, November 06, 2012

Major Data Mining Steps

Normally, we need to go through the following steps to build a predictive modeling solution.

  1. Data Gathering 
  2. Data Validation 
  3. Data Preparation 
  4. Feature Variable Calculation (creating more salient variables that are more predictive of target)
  5. Predictive Model Building and Testing 
  6. Model Deployment
In our opinion, the main challenges are: Data Preparation, Feature Variable Calculation and Model Deployment. Probably 90% of time is spent on the above three areas.