Tuesday, February 17, 2009

Maybe these will be great days for data miners!

While perusing the NC State Institute for Advanced Analytics site (to follow up on the previous post on data mining education), I noticed a link to US News and World Reports career guide, one of which describes how data mining is an "ahead of the curve" career for 2009. While the example is quite limited that is mentioned, it is interesting that data mining is getting such national recognition. Maybe we're in the right industry after all!

Saturday, February 14, 2009

Could these be great days for data miners?

In a recent article on cfo.com, Data Mining in the Meltdown: the Last, Best Hope? the author describes how data quality is the key to future success of businesses. But data quality by itself is not enough,
Of course, data quality matters little if a company is focusing on the wrong measures. The best companies adopt a customer-oriented definition of data quality and recognize that all items of data are not created equal...
In other words, the business objective phase (in the CRISP-DM way of viewing things) is critical. I would add that building models that are assessed in a manner commensurate with the business objective is every bit as important. If you build a series of regression models and take the one with the best R^2, you have very little idea from that metric whether or not the model will do anything productive. One must score and assess the model to reflect the business objective.

The author gets at this idea indirectly with this comment:
For every key performance indicator (KPI), for example, companies should be tracking a key risk indicator (KRI), Friend says. "You plan not just for results, but for contingencies. What happens if sales are down 20 percent?"
In other words, there may just be significant asymmetric costs to incorprate in the scoring of models. I'll be bringing this up at Predictive Analytics World this week; it is arguably one of the biggest mistakes made by modelers.

Tuesday, February 10, 2009

Can you learn data mining in undergraduate or graduate school?

I was recently asked by a former student from one of my data mining courses if a particular program was a good one to learn data mining (it happened to be this one, from NC State). It raises an interesting question: how much can data mining be learned from a book or a course?

Some of the best data miners I have met did not have any statistics course in their past, nor (for some) any higher level mathematics. For my part, I was a computational mathematics major undergrad, and applied math for my masters, but never took a stats course either (though I did take and TA a probability course). That stated, I always recommend in my courses that folks become familiar with basic statistics; one book I have recommended is linked in the book recommendations section--The Cartoon Guide to Statistics. Since I have never taken a college or graduate data mining course, I can't comment directly. My concern is that they are too theoretical (how the algorithms work) rather than practical (how to handle data problems, how to pose proper questions to be addressed by data mining, etc.).

I'm willing to be persuaded though, so if you have experience with good, practical data mining curricula, please let me know.