Tuesday, December 19, 2006

The Best Data Mining Book of 2005

A bit late, but better late than never! Actually, I just heard Stephen Levitt speak at SPSS Directions in November and was reminded, of course, of his book Freakonomics. In 2005, I recommended the book to my data mining course attendees as my favorite data mining book of the year, despite the term "data mining" never appearing (to the best of my knowlege) in the book at all. I think a quote in the preface summarizes why I liked it:

What interested Levitt were the stuff and riddles of everyday life... 'He (Levitt) is an intuitionist. He sifts through a pile of data to find a story no one else has found. He figures a way to measure an effect that veteran economists had declared unmeasurable.
It was the idea of "sifting", a prominent term in the Gartner Group definition (and one that I like in particular) that struck me. And all the examples Levitt gives in his book are examples of uncovering patterns in data that are not the most obvious answers, but rather are ones that fit the data better (in his opinion). I like the book because he approaches data with a forensic mindset.

3 comments:

  1. The best thing to read? I'd need more information before recommending. I'll post on data mining books I like this week.

    Evolution in data mining? I think this is also an interesting question. Data mining is too broad to paint with a single color, but I think the recent applications of ensembles has been more than evolutionary, but closer to revolutionary for model performance. For example, decision trees (which typically require less data preparation than numeric algorithms), can now, thanks to ensembles, achieve similar accuracy to neural networks or support vector machines. This is good news for a practitioner.

    ReplyDelete
  2. Anonymous11:38 AM

    Thanks a lot for what You are doing!Information, that I managed to find here
    is extremely useful and essential for me!With the best regards!
    David

    ReplyDelete
  3. Anonymous10:23 AM

    here you can find many data mining books, also related literature about python, nltk..
    http://data-mining-books.com

    ReplyDelete