Wednesday, May 28, 2008

What data mining software to buy?

This post (http://www.dmreview.com/issues/2007_46/10001040-1.html?portal=analytics) is an interesting example of the assessment of analytics software. The key paragraph is the conclusion where Mr. Raab states
Instead of a horserace between product features, this approach puts the focus where it should be: on value to your business. It recognizes that the value of a new tool depends on the other tools already available, and it forces evaluation teams to explicitly study the impact of different tools on different users. By creating a clearer picture of how each new tool will impact the way work actually gets done within the company, it leads to more realistic product assessments and ultimately to more productive selection choices.


I couldn't agree more. For the past 10 years, since the Elder and Abbott review of data mining software presented at KDD-98 (on my web site) I've tried to think of ways to summarize data mining software. The obvious way is by features, such as which algorithms a product has. The usability of a tool is another characteristic to add, as John, Philip Matkovsky and I wrote about in "An Evaluation of High-End Data Mining Tools for Fraud Detection". I've also described the different packages by the kind of interface (wizard, menu-driven, block-diagram, command line, etc.).

It's not easy to provide a summary in this multi-dimensional view of data mining tools. Sounds like an opportunity for predictive modeling!

2 comments:

Will Dwinnell said...

This is an interesting question, and it reminds me of the endless analysis and discussion (at fracas) that attend the selection of programming languages. One thing I've noticed is that many opinions (especially but not exclusively those of vendors) center on features.

I've noticed, though, that one feature which has a very real impact on tool selection (whether data mining tools or programming tools) is cost. Many of the convenient tables assembled by analysts, consultants, journals and the like will often neglect the issue of how much the darned thing costs. Yet, many people (data miners or programmers) are faced very real budget limitations. I suppose the one nice thing about price is that it is a simple either/or affair: either your budget can accommodate the proce or it can't.

Another issue, which can be extremely difficult to assess without extensive use of the actual tool is: convenience. Ultimately, what the consumer would like to know is: end-to-end, how much time and energy will the typical project take, if I use this tool. This isn't simply a question of whether certain user-interface check boxes have been filled. Some tools have weak interfaces which seem to get int he way as much as the help.

Datalligence said...

for many managers, especially CFOs the first & most important thing is cost. and sadly this comes in the way of biased-free evaluation of the DM software.