Thursday, October 19, 2006

How to doom data mining solutions before even beginning to build models

I was reminding today while speaking with an email marketing expert of the reason many data mining projects fail. It is usually the case that in developing a data mining approach to solve a business objective that there is a disconnect between the two. When data mining algorithms look at data, they are thinking in terms like "minimum squared error", or "R-squared", or "Percent Correct Classification".

These are usually of little importance to the business objective, which may be to find a population of customers who will purchase at least $100 of goods, or respond at a rate greater than 8% to a campaign. In these cases, a model that performs "well" in the algorithm's view may not be particular good at identifying the top-tier responders. Therefore, the problem should be set up with the business objective in mind, not the data mining algorithm's objective in mind, and the models should be assessed using a metric that matches as closely as possible to the business objective.

No comments: