I have been reminded in the past couple weeks working with customers that in many applications of data mining and predictive analytics, unless the stakeholders of predictive models understand what the models are doing, they are utterly useless. When rules from a decision tree, no matter how statistically significant, don't resonate with domain experts, they won't be believed. Arguments that "the model wouldn't have picked this rule if it wasn't really there in the data" makes no difference when the rule doesn't make sense.
There is always a tradeoff in these cases between the "best" model (i.e., most accurate by some measure) and the "best understood" model (i.e., the one that gets the "ahhhs" from the domain experts). We can coerce models toward the transparent rather than the statistically significant by removing fields that perform well but don't contribute to the story the models tell about the data.
I know what some of you are thinking: if the rule or pattern found by the model is that good, we must try to find the reason for its inclusion, make the case for it, find a surrogate meaning, or just demand it be included because it is so good! I trust the algorithms and our ability to assess if the algorithms are finding something "real" compared with those "happenstance" occurrences. But not all stakeholders share our trust, and it is our job to translate the message for them so that their confidence in the models approaches are own.
Tuesday, August 24, 2010
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment