Tuesday, June 22, 2010
Salford to Launch New Integrated Data Mining Suite
Tomorrow night is the launch of SPM (Salford Predictive Miner). If you are in San Diego, give them a holler to let them know you are coming. See you there!
A/B Testing and the Need for Clear Business Objectives
The website http://videolectures.net/ contains a wealth of interesting lectures on a wide variety of topics, including data mining. I was reminded of one today by Ronny Kohavi entitled "Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO" It's short (only 23 minutes) and filled with some very good common-sense principles.
First, it is a talk about the importance of A/B testing, or in other words, constructing experiments to learn customer behavior rather than having the experts make a best guess at how people will behave. He gives some good examples from Microsoft and the sometimes non-intuitive results from actual testing. A book he recommends is Breakthrough Business Results With MVT: A Fast, Cost-Free, Secret Weapon for Boosting Sales, Cutting Expenses, and Improving Any Business Process
The second part of the lecture I found particularly interesting is what Kohavi calls the Overall Evaluation Criterion (OEC), or what I usually call business objectives. He included the great Lewis Carroll quote, "If you don't know where you are going, any road will take you there." I find this a common problem as well: if we don't define a business objective that truly measures the impact of the predictive models we build, we have no way of determining if they are effective or not. This objective must be tied to the business itself. For example, Kohavi argues for using Customer Lifetime Value (CLV) rather than click-through rates as they are more tied to the bottom line.
I would add that it can be useful to have two objectives that are measurable, especially if two objectives better measure the value. For example, in collections risk models, the age of the debt and the amount of the debt are both important components to risk. These are difficult to put into a single number in general, so the two-dimensional risk score can be helpful operationally.
First, it is a talk about the importance of A/B testing, or in other words, constructing experiments to learn customer behavior rather than having the experts make a best guess at how people will behave. He gives some good examples from Microsoft and the sometimes non-intuitive results from actual testing. A book he recommends is Breakthrough Business Results With MVT: A Fast, Cost-Free, Secret Weapon for Boosting Sales, Cutting Expenses, and Improving Any Business Process
I would add that it can be useful to have two objectives that are measurable, especially if two objectives better measure the value. For example, in collections risk models, the age of the debt and the amount of the debt are both important components to risk. These are difficult to put into a single number in general, so the two-dimensional risk score can be helpful operationally.
Wednesday, June 02, 2010
Embedded Analytics and Business Rules: The Holy Grail?
Tomorrow (Thursday) at 3pm EDT I'll be on DM Radio for the broadcast "Embedded Analytics and Business Rules: The Holy Grail?". I'm not sure what the other guests are going to talk about, but my comments will resemble the talk I gave at Predictive Analytics World in February 2010 in the talk Rules Rule: Inductive Business-Rule Discovery in Text Mining. In this help-desk case study, we used decision trees to cherry pick interesting rules, converted them to SQL, and deployed them in a rule system that was applied transactionally, online. I emphasized the text mining portion at PAW, but the methodology was independent of that. In 2002-2003, researchers and I at the IRS applied same kind of approach to rule discovery in selecting returns for audit: use trees to find interesting rules.
The reason we liked the approach was that it was a fast way to overcome two problems. First, decision trees only find the best solution to a problem (according to its measure of "good"). To obtain a richer set of terminal nodes, one can build ensembles of trees, but then one loses the interpretation. On the other hand, one can build association rules, but then you are left with perhaps thousands to tens of thousands of rules that have to be pruned back to get the gist of the key ideas. Many of the rules will be redundant (some completely identical in which records are "hit" by the rule), and it's easy to become lost in the sheer number of rules.
For the Fortune 500 company, we used CART with the battery option to generate a sequence of trees (we iterated on "priors" and misclassification costs, and I think some more options as well to generate variety), and took only those terminal nodes that had sufficiently high classification accuracy. I think we could have used their hotspot analysis for this too, but I wasn't sufficiently well-versed in it at that time.
If you can't join in on the radio broadcast, you can always download the mp3 later.
The reason we liked the approach was that it was a fast way to overcome two problems. First, decision trees only find the best solution to a problem (according to its measure of "good"). To obtain a richer set of terminal nodes, one can build ensembles of trees, but then one loses the interpretation. On the other hand, one can build association rules, but then you are left with perhaps thousands to tens of thousands of rules that have to be pruned back to get the gist of the key ideas. Many of the rules will be redundant (some completely identical in which records are "hit" by the rule), and it's easy to become lost in the sheer number of rules.
For the Fortune 500 company, we used CART with the battery option to generate a sequence of trees (we iterated on "priors" and misclassification costs, and I think some more options as well to generate variety), and took only those terminal nodes that had sufficiently high classification accuracy. I think we could have used their hotspot analysis for this too, but I wasn't sufficiently well-versed in it at that time.
If you can't join in on the radio broadcast, you can always download the mp3 later.
Subscribe to:
Posts (Atom)