Thursday, September 25, 2008

KDD 2008

It's hard to believe that KDD2008 was the first KDD I've attended in seven years. It was striking how much has changed in that time, and that was one of the primary reasons I attended this past year--to see for myself if the reports I've heard are true. Sure enough, they are.

These reports, primarily from colleagues in industry, were that KDD didn't have anything they could "take home and use". Many of these folks are analysts who are decidedly not academic, so I thought I had a sense for what they meant.

I found their reports hit the mark. Seven years ago I was able to find (1) significant numbers of industry personnel at the conference and (2) many talks that were accessible enough for non-academics to understand. This time around there were few industry practitioners I met who were not PhDs. That's not to say there weren't interesting talks. Two I didn't see in person, but read later were the Elkan paper on learning from positive and unlabelled examples and the Grossman paper on Data Clouds. Though-provoking both. The lunch talk by Trevor Hastie was very interesting in talking about regularization, but it was geared toward those who can digest his textbook (which is among the finest data mining / statistical learning texts out there).

Social networking was a key theme of the conference, and it was such a dominant force at the conference that it deserves a separate post.

Lastly, the decline in participation by the business community was nowhere more evident than in the vendors room--only a few data mining software vendors were there, which indicates to me that it isn't viewed as a place to increase sales: if I remember correctly, only Microsoft, Oracle, Statsoft, Salford Systems, and SAS were there. A quick look at the kdnuggets software survey shows who wasn't there.

So it seems that KDD has wandered from a business/academic mix to a more academic conference, which is, of course, the prerogative of the organizers. I'm still searching for a great conference for the data mining practitioner who has the level of understanding of data mining to read and absorb a book like the Witten/Frank machine learning book but desires a more practical approach to the subject.