Monday, May 26, 2008

What Makes a Data Mining Skeptic?

I just found this post expressing skepticism about data mining (I'll let go the comment about predictive analytics being the holy grail of data mining--not sure what this means).

The fascinating part for me was this paragraph:

Anyway. Lindy and I were a bit squirmy through the whole discussion. It seemed like so many hopes and dreams were being placed at the altar of the goddess Clementine... but I had to ask myself, could you REALLY get any more analysis out of it then you could get simply by asking your members what events they attend, plan to attend, ever attended, or might attend in the future, and why? Since when did we stop talking to our members about this stuff? A good internal marketing manager could give you all the answers you seek about which of your various audiences are likely to respond to which of your messages, who's going to engage with you, why and when, who's going to participate in which of your events, etcetera, and they would know these answers not through stats and charts (even if you ask for them) but through experience and listening.


It is interesting on several fronts. First, there is a strong emphasis on personal expertise and experience. But at the heart of the critique is apparently a belief that the data cannot reveal insights, or in other words, a data-driven approach doesn't give you any "analysis". Why would one believe this? (and I do not doubt the sincerity of the comment--I take it at face value).

One reason may be that this individual has never seen or experienced a predictive analytics solution. While this may be true, it also misses what I think is at the heart of the critique. There is a false dichotomy set up here between data analysis and individual expertise. Anyone who has built predictive models successfully knows that one usually must have both: expert knowledge and representative data (to build predictive models).

One reason for this is that while there are undoubtedly some individuals who can "give you all the answers you seek about which of your various audiences are likely to respond to which of your messages". But usually, this falls short for two reasons:
1) most individuals who have to deal with large quantities of data don't know as much they think they know, and related to this
2) it is difficult to impossible for anyone to sort through all the data with all of the permutations that exist.

Data mining usually doesn't tell us things that experts scratch their heads at in amazement. The usually confirm what one suspects (or one of many possible conclusions one may have suspected), but with a few unexpected twists.

So how can we persuade others that there is value in data mining? The first step is realizing there is value in the data.

7 comments:

Maddie Grant said...

Hi Dean - thank you for your awesome comment on my post. I have a response/further question for you, should be up by tomorrow morning. ; )

Lindy Dreyer said...

Hey Dean,
I'm so happy you've jumped into this conversation. I think part of what got me going is my experience of seeing this done poorly. (Not to mention the price tag, which includes staff resources along with software and data warehousing.)

I can certainly see how predictive analysis would help a group make sense out of an extraordinarily large data set. I'm just not convinced that more than a handful of associations fit the bill.

What do you think. How does size impact the effectiveness?

Dean Abbott said...

Just getting back from Chicago, so will comment more soon. I'm very glad you both have chimed in here.

Will Dwinnell said...

Wow, this is interesting. In some ways, the passage above could not be more antithetical to so much I know to be true: I have seen quantitative analysis pay very real returns many times- that this is possible is beyond question.

On the other hand, I also know that there is no good reason to rely solely on quantitative analysis. After all, quantitative analysis is based squarely on whatever measurements have been made, and nothing more. But why those measurements? Why not others? Which others? These last questions are often answered by non-quantitative analysis.

Further, I know that not everyone who practices data analysis is very good at it. To be truly successful, one needs technical competence, organizational competence and a willing and interested audience. Many practitioners whom I've encountered do not even have the first of those things. Yet others are technically proficient, but are unable to apply the science to substantive problems which the organization or client needs solved.

There are plenty of cases where quantitative analysis projects have failed miserably, often after having been heavily marketed with unrealistic promises.

In summary: quantitative analysis can work, though it will not work in all cases. How well one will do without it depends very much on circumstances. I have certainly encountered my share of "old hands" who has "industry experience" which turned out to be dead wrong.

Maddie Grant said...

Will - from our "skeptical" point of view, we have heard a lot of the same thing, where it seems hard to know if the value of the excercise has been proven - and if one agrees that we need experienced people or companies to run those surveys, collect the data, or analyze the data, then it's hard to know how to find the "right" expertise (whether internally or externally). Just seems like so many variables and a lot of uncertainty - which is the paradox at the heart of the science of data mining which you would think would be dealing with cold hard facts! I do however see an inherent value in learning from unexpected results, perhaps, rahter than from results which confirm what we think we know already.

Anonymous said...

Many discussions of this nature and of many other comparisons seem to be of an “either or” nature; or are “it’s perfect or it’s not”. I think it comes down to what you are trying to accomplish. Are you trying to find “the one right answer” or are you trying to improve upon the current results.

If it is truly “better” results i.e. an improvement over the current state then perhaps subjective inputs are best, perhaps more systematic inputs are best or perhaps some combinations of both. Something doesn’t need to be perfect to offer an improvement over the current state. Instead of looking for the holes in one or both approaches perhaps looking at the tradeoffs and gains simultaneously is a better approach? You also never truly know unless you test it out and even then your results are only valid in the context in which they were obtained.

One guarantee is that individuals are different and thus the individuals’ ability will be important no matter what approach is taken. Each person will be better or worse then others at any respective task.

This is in no way a conclusion to the question but in the scope of my personal experience I’ve seen and found personally that the more systematic approach tends to outperform the more subjective approach simply because the systematic approach can be externalized and it’s properties more rigorously tested. Perhaps there are experts who know more out there “intuitively” but any talented person will demonstrate some of what is usually called intuition simply because they have a deeper understanding of the subject or methodologies at hand.

The path taken and the results obtained are both important. Without both the results cannot be reliably reproduced or answer the problem at hand.

My 2-cents,

Jay

Anonymous said...

A big part of the problem is that many people expect science to provide right and proven answers. Yet any statistical analysis, when focused on individual cases, is sometimes going to be wrong. So if you look at it from the point of view of consumer, you say: Wait a minute, you're going to plow through all of this data of mine,including a great deal I didn't know you had, and then, if you don't focus on the right data and ask the right questions, you might get it wrong?! And I'm supposed to go along with this?...

I think the whole industry has some public educating to do.