Wednesday, April 02, 2014

Calculate Variables That Predict Customer Churn

Oracle analytics functions such as lag() are very useful in building interesting variables for predictive models. For example, to predict if a bank account will be closed by a customer we may want to look at the account balance history. If the account balance drop consecutively for three months, it may indicate that the customer will leave the bank. Using the following table as an example, we will describe how to build a variable indicating balance dropping for three consecutive months.

SQL> select * from tbl_account1 order by dt desc;

  ACCT_NUM    BALANCE DT
---------- ---------- ---------
     12345        200 31-MAY-13
     12345       1000 30-APR-13
     12345       5000 31-MAR-13
     12345       8550 28-FEB-13
     12345       8500 31-JAN-13
One approach is to use lag() function. In the following query, for each record we calculate the previous 1, 2 and 3 months balances. lag(BALANCE,1), lag(BALANCE,2) and lag(BALANCE,3) indicate the balance in prevous 1, 2 and 3 records based on the order of "dt", respectively. "partition by acct_num" means the calculation is done independently by acct_num.
with
tbl as(
select a.* ,
lag(BALANCE,1) over(partition by acct_num order by dt) last_1,
lag(BALANCE,2) over(partition by acct_num order by dt) last_2,
lag(BALANCE,3) over(partition by acct_num order by dt) last_3
from tbl_account1 a)
select * from tbl order by dt;

  ACCT_NUM    BALANCE DT            LAST_1     LAST_2     LAST_3
---------- ---------- --------- ---------- ---------- ----------
     12345       8500 31-JAN-13
     12345       8550 28-FEB-13       8500
     12345       5000 31-MAR-13       8550       8500
     12345       1000 30-APR-13       5000       8550       8500
     12345        200 31-MAY-13       1000       5000       8550
Once we understand how to calculate the privous month balance using lag() function, we can derive the indictor showing consecutively three months drop in balance using the following query.
with
tbl as(
select a.* ,
lag(BALANCE,1) over(partition by acct_num order by dt) last_1,
lag(BALANCE,2) over(partition by acct_num order by dt) last_2,
lag(BALANCE,3) over(partition by acct_num order by dt) last_3
from tbl_account1 a)
select ACCT_NUM, balance, dt,
    case when balance-last_1 <0 
          and last_1-last_2<0 
          and last_2-last_3<0 then 1
          else 0 end as bal_drop_in_3m
 from tbl order by dt;

  ACCT_NUM    BALANCE DT        BAL_DROP_IN_3M
---------- ---------- --------- --------------
     12345       8500 31-JAN-13              0
     12345       8550 28-FEB-13              0
     12345       5000 31-MAR-13              0
     12345       1000 30-APR-13              0
     12345        200 31-MAY-13              1
We can also use analytic function sum() over() to calcluate the variable as shown below.
with
tbl as(
select a.* ,
decode(sign(balance-lag(BALANCE,1) over(partition by acct_num order by dt)),
        0,0,
        1,0,
        -1) is_drop
from tbl_account1 a
)
select ACCT_NUM, balance, dt,
case when sum(is_drop) over(partition by acct_num 
            order by dt rows 2 preceding) = -3 then 1
     else 0 end BAL_DROP_IN_3M
 from tbl a order by dt;

  ACCT_NUM    BALANCE DT        BAL_DROP_IN_3M
---------- ---------- --------- --------------
     12345       8500 31-JAN-13              0
     12345       8550 28-FEB-13              0
     12345       5000 31-MAR-13              0
     12345       1000 30-APR-13              0
     12345        200 31-MAY-13              1
The above variable BAL_DROP_IN_3M can be used as one of the inputs into a predictive model. We can also use the same method to calculate variables such as stock price going up for five consecutive days, etc.

No comments: