Tuesday, June 07, 2005

Beware of Being Fooled with Model Performance

Interpreting model performance is a minefield. If one wants model performance to be as good as possible, it is critical to define exactly what "good" means. How does one measure "goodness"? The easiest way to communicate performance is with a single-valued score, such as percent correct classification or R-squared. However, it is precisely this simplification of a complex idea the model is predicting to a single number that can cause one to be fooled. A simple example follows.

Let's assume that a non-profit organization wants a model built that predicts the propensity of individuals to send donations, and that this model has 80+% classification accuracy, even on a test set. Furthermore, assume that the two indicators "Recent Donation Amount" (X1) and "Average Donation Amount" (X2) are two of the top predictors in the model. The figure at the left shows what a Support Vector Machine model did with this data. Even with the good accuracy, there is something disturbing about the model that isn't clear unless one sees a picture: the model isn't finding ranges of average and recent donation amounts that are associated with donors, but rather it is finding islands of donors. The second model (on the right) provided corrective measures to smooth the model, and it much more pleasing. It is saying (roughly) that when someone donates between about $10-$50 on average (X2), they are more likely to respond. It is smooth and there are no pockets of isolated donation amounts, making this model much more believable, even though some accuracy was lost in the process.