We all know
that given reasonable data, a good predictive modeler can build a model that
works well and helps make makes better decisions than what is currently used in
your organization (at least in our own minds). Newer data, sophisticated
algorithms, and a seasoned analyst are all working in our favor when we build
these models, and if success were measured by accuracy (as they are in most
data mining competitions), we're in great shape. Yes, there are always gotchas
and glitches along the way. But when my deliverable is only slideware, even of
the modeling is hard, I'm confident of being able to declare victory at the
end.
However,
the reality is that there is much more to the transition from cool model to
actual deployment than a nice slide deck and paper accepted at one's favorite
predictive analytics, data mining or big data conference. In these venues, the
winning models are those that are "accurate" (more on that later) and
have used creative analysis techniques to find the solution; we won't submit a
paper when we only had to press the "go" button and have the data
mining software give us a great solution!
For me, the
gold standard is deployment. If the model gets used and improves the decisions
an organization makes, I've succeeded. Three ways to increase the likelihood
your models are deployed are: 
1) Make
sure the model stakeholder designs deployment into the project from the
beginning
The model
stakeholder is the individual, usually a manager, who is the advocate of
predictive models to decision-makers. It is possible that a senior-level
modeler can do this task, but that person must be able to switch hit: he or she
must be able to speak the language of management and be able to talk technical
detail to analytics. This may require more than one trusted person: the
manager, who is responsible and makes the ultimate decisions about the models,
and the lead modeler, who is responsible for the technical aspects of the
model. It is more than "talking the talk" and knowing buzz-words in both
realms; the person or persons must truly be "one of" both groups. 
For those
who have followed my blog posts and conference talks, you know I am a big
advocate of the CRISP-DM process model (or equivalent methodologies, which seem
to be endless). I've referred to CRISP-DM often, including on topics related to
what
data miners need to learn
and Defining
the Target Variable, just as
two examples. 
The
stakeholder must not only understand the business of objectives of the model
(Business Understanding in CRISP-DM), but must be present during discussions
take place related to which models will be built. It is essential that
reasonable expectations are put into place from the beginning, including what a
good model will "look like" (accuracy and interpretability) and how
the final model will be deployed. 
I've seen
far too many projects die or become inconsequential because either the wrong
objectives were used in building the models, meaning the models were
operationally useless, or because the deployment of the models was not
considered, meaning again that the models were operationally useless. As an
example, on one project, the model was assumed to be able to be run within a
rules engine, but the models that were built were not rules at all, but were
complex non-linear models that could not be translated into rules. The problem
obviously could have been avoided had this disconnect been verbalized early in
the modeling process. 
2) Make
sure modelers understand the purpose of the models
The
modelers must know how the models will be used and what metrics should be used
to judge model performance. A good summary of typical error metrics used by
modelers is found here. However, for most of the models I have
deployed in customer acquisition, retention, and risk modeling, the treatment
based on the model is never applied to the entire population (we don't
mail everyone, just a subset). So the metrics that make the most sense are
often ones like "lift after the top decile", maximum cumulative net
revenue, top 1000 scores to be investigated, etc. I've actually seen negative
correlations between the ranking of models based on global metrics (like
classification error or R^2) vs. the ranking based on subset selection ranking,
such as top 1000 scores; very different models may be deployed depending on the
metric one uses to assess them. If modelers aren't aware of the metric to be
used, the wrong model can be selected, even one that does worse than the
current approach.
Second, if
the modelers don't understand how the models will be deployed operationally,
they may find a fantastic model, one that maximizes the right metric, but is
useless. The Neflix Prize is a great
example: the final winning model was
accurate but far too complex to be used. Netflix extracted key pieces to the
models to operationalize instead. I've had customers stipulate to me that
"no more than 10 variables can be included in the final model". If
modelers aren't aware of specific timelines or implementation constraints, a
great but useless model can be the result.
3) Make
sure the model stakeholder understands what the models can and can't do
In the
effort to get models deployed, I've seen models elevated to a status they don't
deserve, most often by exaggerating their accuracy and expected performance
once in operation. I understand why modelers may do this: they have a direct
stake in what they did. But the manager must be more skeptical and
conservative.
One of the
most successful colleagues I've ever worked with used to assess model
performance on held-out data using the metric we had been given (maximum depth
one could mail to and still achieve the pre-determined response rate). But then
he always backed off what was reported to his managers by about 10% to give
some wiggle room. Why? Because even in our best efforts, there is still a
danger that the data environment after the model is deployed will differ from
that used in building the models, thus reducing the effectiveness of the
models.
A second
problem for the model stakeholder is communicating an interpretation of the
models to decision-makers. I've had to do this exercise several times in the
past few months and it is always eye-opening when I try to explain the patterns
a model is finding when the model is itself complex. We can describe overall
trends ("on average", more of X increases the model score) and we can
also describe specific patterns (when observable fields X and Y are both high,
the model score is high). Both are needed to communicate what the models do,
but have to connect with what a decision-maker understands about the problem.
If it doesn't make sense, the model won't be used. If it is too obvious, the
model isn't worth being used. 
The ideal
model for me is one where the decision-maker nods knowingly at the "on
average" effects (these should usually be obvious). Then, once you throw
in some specific patterns, he or she should scrunch his/her eyes, think a bit,
then smile as the implications of the pattern dawns on them as that pattern
really does make sense (but was previously not considered). 
As
predictive modelers, we know that absolutes are hard to come by, so even if
these three principles are adhered to, other factors can sabotage the
deployment of a model. Nevertheless, in general, these steps will increase the
likelihood that models are deployed. In all three steps, communication is the
key to ensuring the model built addresses the right business objective, the
right scoring metric, and can be deployed operationally.
NOTE: this post was originally posted for the Predictive Analytics Times at http://www.predictiveanalyticsworld.com/patimes/january13/ 
Thanks for the excellent post Dean. I'm an academic who works in the field of computational biology and this is an issue I've given quite a bit of thought to. I am currently in the process of building models to predict whether a DNA sequencing read was sampled from nuclear, mitochondrial, or chloroplast DNA. It is clear to me at this point that these models are going to work excellently and that they have the potential to improve the state-of-the-art in the field of genome assembly. However, at this point, the question that is even more important than whether or not the models will make accurate predictions is how to get researchers to use the technology. In this case, instead of having a central stake-holder who knows what he/she wants to accomplish using the model, there are a number of stake-holders currently unaware that the model even exists. Still, several of the general principles you mention apply. Using your criteria, it would mean I need to clearly communicate to these stake holders that the model can help them achieve measurable improvements in their application of interest (primarily genome assembly). I need to be able to show them quantitatively that the "lift" in their goal application is significantly above the baseline they could achieve without employing the model. I also need to make sure that their use of the model is not too cumbersome (maybe I'll make a web-form they can submit their data to). It also sounds like it would be a good idea to under-promise and over-deliver. Thanks for a great post!
ReplyDelete