Applied Data Science and Machine Learning: 06/01/2005

Tuesday, June 07, 2005

Beware of Being Fooled with Model Performance

Interpreting model performance is a minefield. If one wants model performance to be as good as possible, it is critical to define exactly what "good" means. How does one measure "goodness"? The easiest way to communicate performance is with a single-valued score, such as percent correct classification or R-squared. However, it is precisely this simplification of a complex idea the model is predicting to a single number that can cause one to be fooled. A simple example follows.

Let's assume that a non-profit organization wants a model built that predicts the propensity of individuals to send donations, and that this model has 80+% classification accuracy, even on a test set. Furthermore, assume that the two indicators "Recent Donation Amount" (X1) and "Average Donation Amount" (X2) are two of the top predictors in the model. The figure at the left shows what a Support Vector Machine model did with this data. Even with the good accuracy, there is something disturbing about the model that isn't clear unless one sees a picture: the model isn't finding ranges of average and recent donation amounts that are associated with donors, but rather it is finding islands of donors. The second model (on the right) provided corrective measures to smooth the model, and it much more pleasing. It is saying (roughly) that when someone donates between about $10-$50 on average (X2), they are more likely to respond. It is smooth and there are no pockets of isolated donation amounts, making this model much more believable, even though some accuracy was lost in the process.

Applied Data Science and
Machine Learning

Tuesday, June 07, 2005

Beware of Being Fooled with Model Performance

Applied Predictive Analytics

Contributors

Our Web Sites

Smart Data Collective

Blog Archive

Data Mining Blogs and Sites

Data Mining Conferences

Labels

Insurance

Popular Posts

Applied Data Science and Machine Learning

Tuesday, June 07, 2005

Beware of Being Fooled with Model Performance

Applied Predictive Analytics

Contributors

Our Web Sites

Subscribe To This Blog

Smart Data Collective

Blog Archive

Data Mining Blogs and Sites

Data Mining Conferences

Labels

Insurance

Popular Posts

Applied Data Science and
Machine Learning