Monday, August 02, 2010

Is there too much data?

I was reading back over some old blog posts, and came across this quote from Moneyball: The Art of Winning an Unfair Game

Intelligence about baseball statistics had become equated in the public mind with the ability to recite arcane baseball stats. What [Bill] James's wider audience had failed to understand was that the statistics were beside the point. The point was understanding; the point was to make life on earth just a bit more intelligible; and that point, somehow, had been lost. 'I wonder,' James wrote, 'if we haven't become so numbed by all these numbers that we are no longer capable of truly assimilating any knowledge which might result from them.' [italics mine]

I see this phenomenon often these days; we have so much data that we build models without thinking, hoping that the sheer volume of data and sophisticated algorithms will be enough to find the solution. But even with mounds of data, the insight still occurs often on the micro level, with individual cases or customers. The data must tell a story. 

The quote is a good reminder that no matter the size of the data, we are in the business of decisions, knowledge, and insight. Connecting the big picture (lots of data) to decisions takes more than analytics.


Todd Nevins - icrunchdata said...

Excellent point. To find that rare person that is able to make informative and strategic business decisions out of mountains of data is becoming more difficult. This aspect of a person's background is very difficult to figure out in a few interviews.

Dean Abbott said...

One caveat, and I thought about this after seeing Bob Grossman's talk at the last Predictive Analytics World. While decisions are made in the micro, the more specific one needs to be, the more data one needs to make reasonable inferences.

Let's say you are building a fraud detection model. If you assume behavior over the entire U.S. is consistent, then the size of the data is then X. However, if you determine that behavior changes conditioned on ZIP code, sex, age, ..., then you need a boatload of data to populate the permutations properly.

The conclusion then is we need mountains of data to make micro decisions better. My main point is that because actions are made in the micro, lots data is needed, but is not a surrogate for intelligent use of the data.

So Todd, that stated, I agree that it is difficult to identify insights by plowing through data--it takes careful examination of the micro (hopefully mostly automatic, through algorithms, but undoubtedly also through manual examination and validation).

21st Century Software Solutions said...

IBM AIX Admin Online Training
IBM AS/400 Online Training
IBM AWM Online Training
IBM BPM Online Training
IBM eMessage Online Training
IBM Unica Interact Training Plan
IBM Unica online training
Call Us-91-900-044-4287 21st Century Software Solutions Online Training