Sunday, April 06, 2014

More on Calculating Variables That Predict Customer Churn

In the post Calculate Variables That Predict Customer Churn, we use Oracle analytics function lag() to caculate those accounts with account balances dropping for three consecutive months which may indicate that the customer may leave the bank. This calculation can be done using pattern matching, MATCH_RECOGNIZE, available on Oracle 12c. We use the following table as an example.

SQL> select *from tbl_account1 order by acct_num, dt;

  ACCT_NUM    BALANCE DT
---------- ---------- ---------
     12345       8500 31-JAN-13
     12345       8550 28-FEB-13
     12345       5000 31-MAR-13
     12345       1000 30-APR-13
     12345        200 31-MAY-13
     37688       8800 31-JAN-13
     37688       7000 28-FEB-13
     37688       5300 31-MAR-13
     37688       1300 30-APR-13
     37688        500 31-MAY-13
To calculate those accounts with balances dropping for 3 consecutive months, we use the following MATCH_RECOGNIZE query.
SELECT *
FROM tbl_account1 MATCH_RECOGNIZE (
     PARTITION BY acct_num
     ORDER BY dt
     MEASURES
      last(down.balance) bal,
      last(DOWN.dt) AS dt
     ONE ROW PER MATCH
     AFTER MATCH SKIP TO NEXT ROW
     PATTERN (DOWN{3})
     DEFINE
        DOWN AS DOWN.balance < PREV(DOWN.balance)
     ) MR
ORDER BY acct_num, dt;

  ACCT_NUM        BAL DT
---------- ---------- ---------
     12345        200 31-MAY-13
     37688       1300 30-APR-13
     37688        500 31-MAY-13
In the above query, "PARTITION BY acct_num" specifies the pattern matching will be performed indepdently for each acct_num. "DEFINE DOWN AS DOWN.balance < PREV(DOWN.balance)" defines pattern variable DOWN as a result of comparing the balance from current and previous row (based on "order by dt"). PATTERN (DOWN{3})means finding 3 consecutive downs. The syntax for pattern definition is similar to regular expressions. Last(down.balance) and last(down.dt) keep the last balance and dt in a successful match.

Pattern matching queries can be used to extract useful patterns from time series data. These patterns can then be used as input variables for predictive models. Thus, in Oracle 12c SQL, the combination of pattern matching and predictive models is very powerful.

No comments: