Tuesday, February 10, 2009

Can you learn data mining in undergraduate or graduate school?

I was recently asked by a former student from one of my data mining courses if a particular program was a good one to learn data mining (it happened to be this one, from NC State). It raises an interesting question: how much can data mining be learned from a book or a course?

Some of the best data miners I have met did not have any statistics course in their past, nor (for some) any higher level mathematics. For my part, I was a computational mathematics major undergrad, and applied math for my masters, but never took a stats course either (though I did take and TA a probability course). That stated, I always recommend in my courses that folks become familiar with basic statistics; one book I have recommended is linked in the book recommendations section--The Cartoon Guide to Statistics. Since I have never taken a college or graduate data mining course, I can't comment directly. My concern is that they are too theoretical (how the algorithms work) rather than practical (how to handle data problems, how to pose proper questions to be addressed by data mining, etc.).

I'm willing to be persuaded though, so if you have experience with good, practical data mining curricula, please let me know.

9 comments:

  1. At RWTH Aachen University in Aachen, Germany, we had a "Data Mining Lab Course" which 3-5th year students could attend in order to participate in the data mining cup.

    The course was a big success. Checkout out the course web site

    ReplyDelete
  2. I'm posting this on behalf of Neil:

    I think it's like any college course. There are those that are on the ramp to graduate school and a Ph.D. and those that are designed for students that aren't. My math classes in college were hopelessly theoretical and of no use to, say, an engineer or a geophysicist (or an actuary, for that matter, which is where I wound up for a while.) Clearly, someone who is going to move on to algorithm research and development needs to be in a "theoretical" data mining class. Someone working in a degree in communications does not need a lit class to deconstruct Proust. Data mining classes for business-oriented students should lighten up on the theory and "proofs" of the algorithms and give the students what they need - a solid understanding of what to use when and how to build useful descriptive and predictive models. -Neil Raden
    --
    Neil Raden, Founder of Hired Brains Inc.

    ReplyDelete
  3. I wholeheartedly agree with you that there must be both theory and practice. Because of this, should the University support both, or should they teach the theoretical and leave the practice to professional development or extended study / night courses? I hope not!

    ReplyDelete
  4. On behalf of Neil:

    I think the university should teach both, the same way they teach statistics and mathematical statistics. Data mining is going to be an increasingly important thing that people do now that we have petabytes of data and lots of processors to crunch it. There will be non-academic courses too, and that's great.

    ReplyDelete
  5. Tools and industry moves so quickly and, for example, the expense and development put into a corporate data warehouse might not be easy for a University to replicate. Many of the tools are available for students, but the cost can be prohibitive (and then there's the problem of which tools to use).

    For these reasons I think it might be a losing battle for a University to provide a complete industry focused data mining course. The basics are necessary, but I don't expect training in toolsets and hands-on experience of data mining. Understanding of the problems what will be faced is most of the way to the solution. I think that's all a graduate needs.

    - Tim

    ReplyDelete
  6. When I was at EPFL, I had some machine learning courses (with a strong focus on theory). I learned mainly through my phd and reading data mining books.

    The biggest difficulty when looking for a job in industry was the tools issue: companies don't care about Matlab and Java data miners. They want SAS, SPSS or Cognos programmers. That was a real drawback in my resume.

    ReplyDelete
  7. In reference to best preparing individuals to do the actual technical work, I suppose that it would be easier to describe what the data miner should know, rather than how such knowledge should be imparted.

    Even within this context, there will be important variations. Many analysts do perfectly well with data mining shells. For my part, I have found it beneficial to build many of my own tools, but this requires additional expertise.

    Obviously, a corresponding skill which is emerging at this time is the organizational management of data mining, which would require a different type and level of knowledge.

    ReplyDelete
  8. Your posting was very helpful for me to move on. I am a graduate student who watns to become the best data miner in the future. Studying for high level statistic course was one of my concerns I need to consider about. As you said, I think I have to be familiar with at least basic statistic knowledge for data mining and my research.

    ReplyDelete
  9. hello sir,
    i am a student in computer science..
    i read some books on data mining and have taken it as a course..
    but i want to have some practical work on it. can u guide me as to where to start it? can i build any tools etc?

    ReplyDelete