Saturday, August 11, 2007

Rexer Analytics Data Miner Survey, Aug-2007

Rexer Analytics recently distributed a report summarizing the findings of their survey of data miners (observation count=214, after removal of tool vendor employees).

Not surprisingly, the top two types of analysis were: 1. predictive modeling (89%) and 2. segmentation/clustering (77%). Other methods trail off sharply from there.

The top three types of algorithms used were: 1. decision trees (79%), 2. regression (77%) and 3. cluster analysis (72%). It would be interesting to know more about the specifics (which tree-induction algorithms, for instance), but I'd be especially interested in what forms of "regression" are being used since that term covers a lot of ground.

Responses regarding tool usage were divided into never, occasionally and frequently. The authors of the report sorted tools in decreasing order of popularity (occasionally plus frequently used). Interestingly, your own code took second place with 45%, which makes me wonder what languages are being used. (If you must know, SPSS came in first, with 48%.)

When asked about challenges faced by data miners, the top three answers were: 1. dirty data (76%), 2. unavailability of/difficult access to data (51%) and 3. explaining data mining to others (51%). So much for quitting my job in search of something better!

No comments: