Saturday, February 14, 2009

Could these be great days for data miners?

In a recent article on, Data Mining in the Meltdown: the Last, Best Hope? the author describes how data quality is the key to future success of businesses. But data quality by itself is not enough,
Of course, data quality matters little if a company is focusing on the wrong measures. The best companies adopt a customer-oriented definition of data quality and recognize that all items of data are not created equal...
In other words, the business objective phase (in the CRISP-DM way of viewing things) is critical. I would add that building models that are assessed in a manner commensurate with the business objective is every bit as important. If you build a series of regression models and take the one with the best R^2, you have very little idea from that metric whether or not the model will do anything productive. One must score and assess the model to reflect the business objective.

The author gets at this idea indirectly with this comment:
For every key performance indicator (KPI), for example, companies should be tracking a key risk indicator (KRI), Friend says. "You plan not just for results, but for contingencies. What happens if sales are down 20 percent?"
In other words, there may just be significant asymmetric costs to incorprate in the scoring of models. I'll be bringing this up at Predictive Analytics World this week; it is arguably one of the biggest mistakes made by modelers.

No comments: