The predictive models that we produced won head-to-head competitions and were chosen by clients. For example, our model predicting the risk of a new customer will not pay his/her phone bill was used by a top 3 cell phone company. Our debit card fraud detection model is being used by a top 15 banks. Our competitors included one of the three credit bureaus that had hundreds of statisticians working for them.
We have found that if we follow a number of principles, we will be able to produce good predictive models and quality data analytics work in general.
The first principle is to avoid making mistakes. We have seen many cases where mistakes damage the reputation and credibility of data analysts. For example, statisticians analyzed data and drew the conclusion that compromised credit cards were less likely show fraudulent activities than normal cards. It was a mistake, of course. One of our competitors loaded customer's data wrongly and produced reports where the numbers did not make sense at all. Mistakes like those will cause immediate rejection from clients and permanent damage to analysts' reputation.
We need to realize that avoiding mistakes should be an inherent part of our processes. In a large project, from data gathering, loading, validating, summarizing, model building, report generation and model deployment, we may have to take many steps and produce hundreds of data sets. To avoid making mistake, we need to double check our results. We have found that it is actually much harder to verify the results are correct than to simply execute the steps to produce them.
Thus, we always perform two tasks: 1. produce the result; 2. verify the result. It is better to spend more time to produce correct results than to quickly deliver something wrong that cause irrecoverable damage to the credibility. We will talk more about avoid making mistakes in the post the First principle of data analytics is to avoid making mistakes