Wednesday, June 02, 2010

Embedded Analytics and Business Rules: The Holy Grail?

Tomorrow (Thursday) at 3pm EDT I'll be on DM Radio for the broadcast "Embedded Analytics and Business Rules: The Holy Grail?".  I'm not sure what the other guests are going to talk about, but my comments will resemble the talk I gave at Predictive Analytics World in February 2010 in the talk Rules Rule: Inductive Business-Rule Discovery in Text Mining. In this help-desk case study, we used decision trees to cherry pick interesting rules, converted them to SQL, and deployed them in a rule system that was applied transactionally, online. I emphasized the text mining portion at PAW, but the methodology was independent of that. In 2002-2003, researchers and I at the IRS applied same kind of approach to rule discovery in selecting returns for audit: use trees to find interesting rules.

The reason we liked the approach was that it was a fast way to overcome two problems. First, decision trees only find the best solution to a problem (according to its measure of "good"). To obtain a richer set of terminal nodes, one can build ensembles of trees, but then one loses the interpretation. On the other hand, one can build association rules, but then you are left with perhaps thousands to tens of thousands of rules that have to be pruned back to get the gist of the key ideas. Many of the rules will be redundant (some completely identical in which records are "hit" by the rule), and it's easy to become lost in the sheer number of rules.

For the Fortune 500 company, we used CART with the battery option to generate a sequence of trees (we iterated on "priors" and misclassification costs, and I think some more options as well to generate variety), and took only those terminal nodes that had sufficiently high classification accuracy. I think we could have used their hotspot analysis for this too, but I wasn't sufficiently well-versed in it at that time.

If you can't join in on the radio broadcast, you can always download the mp3 later.

No comments: