Wednesday, May 28, 2008

What data mining software to buy?

This post (http://www.dmreview.com/issues/2007_46/10001040-1.html?portal=analytics) is an interesting example of the assessment of analytics software. The key paragraph is the conclusion where Mr. Raab states
Instead of a horserace between product features, this approach puts the focus where it should be: on value to your business. It recognizes that the value of a new tool depends on the other tools already available, and it forces evaluation teams to explicitly study the impact of different tools on different users. By creating a clearer picture of how each new tool will impact the way work actually gets done within the company, it leads to more realistic product assessments and ultimately to more productive selection choices.


I couldn't agree more. For the past 10 years, since the Elder and Abbott review of data mining software presented at KDD-98 (on my web site) I've tried to think of ways to summarize data mining software. The obvious way is by features, such as which algorithms a product has. The usability of a tool is another characteristic to add, as John, Philip Matkovsky and I wrote about in "An Evaluation of High-End Data Mining Tools for Fraud Detection". I've also described the different packages by the kind of interface (wizard, menu-driven, block-diagram, command line, etc.).

It's not easy to provide a summary in this multi-dimensional view of data mining tools. Sounds like an opportunity for predictive modeling!

Monday, May 26, 2008

What Makes a Data Mining Skeptic?

I just found this post expressing skepticism about data mining (I'll let go the comment about predictive analytics being the holy grail of data mining--not sure what this means).

The fascinating part for me was this paragraph:

Anyway. Lindy and I were a bit squirmy through the whole discussion. It seemed like so many hopes and dreams were being placed at the altar of the goddess Clementine... but I had to ask myself, could you REALLY get any more analysis out of it then you could get simply by asking your members what events they attend, plan to attend, ever attended, or might attend in the future, and why? Since when did we stop talking to our members about this stuff? A good internal marketing manager could give you all the answers you seek about which of your various audiences are likely to respond to which of your messages, who's going to engage with you, why and when, who's going to participate in which of your events, etcetera, and they would know these answers not through stats and charts (even if you ask for them) but through experience and listening.


It is interesting on several fronts. First, there is a strong emphasis on personal expertise and experience. But at the heart of the critique is apparently a belief that the data cannot reveal insights, or in other words, a data-driven approach doesn't give you any "analysis". Why would one believe this? (and I do not doubt the sincerity of the comment--I take it at face value).

One reason may be that this individual has never seen or experienced a predictive analytics solution. While this may be true, it also misses what I think is at the heart of the critique. There is a false dichotomy set up here between data analysis and individual expertise. Anyone who has built predictive models successfully knows that one usually must have both: expert knowledge and representative data (to build predictive models).

One reason for this is that while there are undoubtedly some individuals who can "give you all the answers you seek about which of your various audiences are likely to respond to which of your messages". But usually, this falls short for two reasons:
1) most individuals who have to deal with large quantities of data don't know as much they think they know, and related to this
2) it is difficult to impossible for anyone to sort through all the data with all of the permutations that exist.

Data mining usually doesn't tell us things that experts scratch their heads at in amazement. The usually confirm what one suspects (or one of many possible conclusions one may have suspected), but with a few unexpected twists.

So how can we persuade others that there is value in data mining? The first step is realizing there is value in the data.