Monday, August 02, 2010

Is there too much data?

I was reading back over some old blog posts, and came across this quote from Moneyball: The Art of Winning an Unfair Game

Intelligence about baseball statistics had become equated in the public mind with the ability to recite arcane baseball stats. What [Bill] James's wider audience had failed to understand was that the statistics were beside the point. The point was understanding; the point was to make life on earth just a bit more intelligible; and that point, somehow, had been lost. 'I wonder,' James wrote, 'if we haven't become so numbed by all these numbers that we are no longer capable of truly assimilating any knowledge which might result from them.' [italics mine]


I see this phenomenon often these days; we have so much data that we build models without thinking, hoping that the sheer volume of data and sophisticated algorithms will be enough to find the solution. But even with mounds of data, the insight still occurs often on the micro level, with individual cases or customers. The data must tell a story. 


The quote is a good reminder that no matter the size of the data, we are in the business of decisions, knowledge, and insight. Connecting the big picture (lots of data) to decisions takes more than analytics.

3 comments:

  1. Excellent point. To find that rare person that is able to make informative and strategic business decisions out of mountains of data is becoming more difficult. This aspect of a person's background is very difficult to figure out in a few interviews.

    ReplyDelete
  2. One caveat, and I thought about this after seeing Bob Grossman's talk at the last Predictive Analytics World. While decisions are made in the micro, the more specific one needs to be, the more data one needs to make reasonable inferences.

    Let's say you are building a fraud detection model. If you assume behavior over the entire U.S. is consistent, then the size of the data is then X. However, if you determine that behavior changes conditioned on ZIP code, sex, age, ..., then you need a boatload of data to populate the permutations properly.

    The conclusion then is we need mountains of data to make micro decisions better. My main point is that because actions are made in the micro, lots data is needed, but is not a surrogate for intelligent use of the data.

    So Todd, that stated, I agree that it is difficult to identify insights by plowing through data--it takes careful examination of the micro (hopefully mostly automatic, through algorithms, but undoubtedly also through manual examination and validation).

    ReplyDelete
  3. IBM AIX Admin Online Training
    IBM AS/400 Online Training
    IBM AWM Online Training
    IBM BPM Online Training
    IBM eMessage Online Training
    IBM Unica Interact Training Plan
    IBM Unica online training
    Call Us-91-900-044-4287 21st Century Software Solutions Online Training
    contact@21cssindia.com

    ReplyDelete