Saturday, February 16, 2013
The Comparison of Different Models
In previous posts, we mentioned that a great deal of the time should be spent on understanding data and building feature variables that are truly relevant to the target variable. The next question is which predictive models should we use? There are so many choices of types of models. For examples, for classification problems, the models we can use include CART, logistic regression, SVM, Neural Nets, Nearest-K, Bayesian classification model, ensemble models, etc. In my PhD dissertation, most of the content is dedicated to the empirical comparison of different models. In the commercial world, sometimes I applied different models to solve the same problem. The follow lift charts are the actual results for models that predict direct mail response. The models I tested include gradient boosting trees, CART,a logistic regression, and a simple cell (or cube) model. The cell of cube model here divides the training data into many cubes and calculates the response rate for each cube. The predicted response rate for a new data point is that of the cube where it is located.