Monday, October 20, 2008

What topics would you like to see covered at a KDD conference?

This is your chance to voice your opinion!

What topics, sessions, or tutorials would be most useful for you at a conference like KDD? Would a full industrial track be of interest, of are industries so diverse that we really need tracks to be narrowed to specific industries?

Please--practitioners only. I'm defining practitioners as those who get paid to develop models that are actually used in industry.

I'll kick it off with one idea:

Tutorials (1/2 day) geared toward the practitioner. This means that if techniques are described (such as social networking), there must be implementations of the algorithmic ideas available in competitive commercial software. As great as R and Matlab are, for example, relatively few practitioners are programmers that can take advantage of these kinds of frameworks.

I know there are tutorials at KDD every year. This year I didn't go because they were all on Sunday and I wasn't able to attend then, but would have wanted to go to the Text Mining tutorial as that is a topic that has become a significant part of my business over the past couple of years.

One last thought: I think one thing that may happen (understandably) is that topics that have been covered in years passed are not revisited. For those of us who live in the data mining world, it is far more interesting to continue to explore new ideas, especially those that build on ideas we have already explored in depth. However, as data mining increases in its use, we are bringing folks in who have not had that same benefit. For many, a tutorial on decision trees would be very useful and interesting (like the KDD 2001 tutoral--trees to my knowledge have not been revisited since except in the framework of ensembles in 2007).


Shane said...

I'm particularly interested in innovative applications of data mining in the following areas:
- customer retention/cross sell/segmentation
- risk (insurance, credit scoring, etc)
- fraud
- spatial and locational aspects of modelling applications
- also, how are practitioners scaling data mining to the next level? (eg. mapreduce/hadoop?)

Tutorials are a great idea... however limiting to commercial software does not really solve the problem of practitioners not being skilled in the particular software in the tutorial...

James Taylor said...

You should check out Predictive Analytics World (a new show in 2009) as it is aimed at practitioners not academics in the predictive analytics space

Dean Abbott said...

James: that's a nice plug for Predictive Analytics World. I'll be there (and presenting) and hope it does in fact appeal to practitioners.

What about in particular appealed to you?

Will Dwinnell said...

My experience at KDD-2006, mirroring yours, was that there was certainly too much academic material. I would like to see more "my experience has been that modeling algorithm X works well under these conditions...", "as a novel way to solve problem Y, i did this..."- even "to pass the time this summer, I built a model which does Z...".

I think that I am typical of most practitioners in being less concerned with elaborate proofs of optimality than with a clever way to represent data, handle missing values, etc.

Tim Manns said...

I agree with both Shane's and Will's comments.

I too would like to see some practical implementations and the devil is in the detail!

I reckon a lot of these proposed presentations would probably take an hour, and by their nature would be pretty grey-matter intensive. I don't think we can easily get away from some industry specific elements, but the core concepts used in good practical examples can always be generalised.

One major problem is intelectual property and competitive advantage concerns. I have to go lots of red-tape and several stages of corporate and legal approval before I present any of my work. I've rarely encountered presentations that deliver detail or practical examples of any benefit, so although I'd love to see this type of stuff I am very cynical about it... :)

Will Dwinnell said...

Re-reading Dean's post, the following question provoked a response: "...are industries so diverse that we really need tracks to be narrowed to specific industries?"

While industries tend to face particular types of problems (credit scoring comes to mind: what modern bank could survive without some sort of good loan / bad loan probability prediction?), I think the techniques of data mining are broad enough that one shouldn't get caught in the trap of "Well, that data mining stuff may work in the oil industry, but here, we sell newspapers."

As I tell people: To me a number is a number is a number... For the purposes of my analysis, it doesn't matter whether that number represents a blood sugar level, the temperature inside an engine or the exchange rate between US dollars and Euros. Not that I am discounting "industry knowledge", but I do believe that our tools are easily shared across a wide spectrum of organizations and applications.

In attending a conference, I am much less concerned with what industry the presenter's solution serves than I am in the likelihood that his or her solution will serve me.

Ralph Winters said...

Well, in the aftermath of the recent financial crisis, I think topics regarding credit scoring and risk assessment is in order.

Eric Siegel, Ph.D. said...
This comment has been removed by the author.
Eric Siegel, Ph.D. said...

Following up on James Taylor and Dean's comments, indeed, Predictive Analytics World (February 18-19, 2009 in San Francisco) is replete with named case studies describing commercially deployed predictive analytics. The goal of PAW is to serve as the go-to event, covering today's commercial deployment of predictive analytics, across industries and across software vendors.

The leading enterprises have responded, signing up to tell their stories. PAW-09 will have 25 sessions across two tracks, so you can witness how predictive analytics is applied at 3M, Acxiom, Affiliated Computer Services, Charles Schwab, Chase, Click Forensics, Google, Linden Lab (Second Life), The National Rifle Association, Netflix, Pinnacol Assurance, Reed Elsevier, San Diego Supercomputer Center, Sun, Wells Fargo Credit Card Services, Wells Fargo Internet Services Group, and others.

The focus is on solutions, taking you inside the "how" to achieve results - not just talking about the opportunity. These case study sessions provide real world insight on cutting-edge predictive analytics practices.

For more information, see