Monday, April 25, 2016

Tracking Model Performance Over Time


Most introductory data mining texts include substantial coverage of model testing. Various methods of assessing true model performance (holdout testing, k-fold cross validation, etc.) are usually explained, perhaps with some important variants, such as stratification of the testing samples.

Generally, all of this exposition is aimed at in-time analysis: Model development data may span multiple time periods, but the testing is more or less blind to this: all periods are treated as fair game and mixed together. This is fine for model development. Once predictive models are deployed, however, it is desirable to continue testing to track model performance over time. Models which degrade over time need to be adjusted or replaced.

Subtleties of Testing Over Time

Nearly all production model evaluation is performed with new out-of-time data. As new periods of observed outcomes become available, they are used to calculate running performance measures. As far it goes, focusing on the actual performance metric makes sense. In my experience, though, some clients become distracted by movement in the independent variables or in the predicted or actual outcome distributions, in isolation. It is important to understand the dynamic of these changes to fully understand model performance over time.

For the sake of a thought experiment, consider a very simply problem with one independent variable, and one target variable, both real numbers. Historically, the distribution of each of these variables has been confined to specific ranges. A predictive model has been constructed as a linear regression which attempts to anticipate the target variable, using only the input of the single independent variable (and a constant). Assume that errors observed in the development data have been small and otherwise unremarkable (they are distributed normally, their magnitude is relatively constant across the range of the independent variable, there is no obvious pattern to them and so forth).

Once this model is deployed, it is executed on all future cases drawn from the relevant statistical universe, and predictions are saved for further analysis. Likewise, actual outcomes are recorded as they become available. At the conclusion of each future time period, model performance within that period is examined.

Consider the simplest change to well-developed model: the distribution of the independent variable remains the same, but the actual outcomes begin to depart the regression line. Any number of changes could be taking place in the output distribution, but the predicted distribution (the regression line) cannot move since it is entirely defined by the independent variable, which in this case is stable. By definition, model performance is degrading. This circumstance is easy to diagnose: the dynamic linking the target and independent variables is changing, hence a new model is necessary to restore performance.

What happens, though, when the independent variable begins to migrate? There are two possible effects (in reality, some combination of these extremes is likely): 1. The distribution of actual outcomes will either shift to appropriately match the change ("the dots march along the regression line"), or 2. The distribution of actual outcomes does not shift to match the change. In the first case, the model continues to correctly identify the relationship between the target and the independent variable, and model performance will more-or-less endure. In the second case, reality begins to wander from the model and performance deteriorates. Notice that, in the second case, the actual outcome distribution may or may not change noticeably- either way, the model no longer correctly anticipates reality and needs to be updated.


The example used here was deliberately chosen to be simple, for illustrations' sake. Qualitatively, though, the same basic behaviors are exhibited by much more complex models. Models featuring multiple independent variables or employing complex transformations (neural networks, decision trees, etc.) obey the same fundamental dynamic. Given the sensitivity of nonlinear models to each of their independent variables, a migration in even one of them may provoke the changes described above. Consideration of the components of this interplay in isolation only serves to confuse: Changes over time can only be understood as part of the larger whole.


Anonymous said...

I think it makes perfect sense. Model training should be a on-going progress in a rapidly-changing world. Just like how humans learn, we learn new things everyday and that's how we progress. What's it gonna be like if we decide to learn all the things we need for maybe 22 years, and once we graduate from college, we refuse to learn anything new and always make decisions based on the things we learned in the first 22 years? We'll probably fail in life miserably!
However, it's a non-trivial task for models to continue learning over time. Training excessively could make the model overfit and increase its bias. I think a good approach could be that once we detect performance drop and change of variables in the real world, we could create a new model which takes the old model's parameters and learned attributes into account.

Dexus Media said...

nice article..its amazing...If you Are looking Best Digital Marketing Company in jaipur,
SEO Company in jaipur,
SEO services in jaipur,
website designer in jaipur

Coepd BA Trainings said...

We at Coepd declared Data Science Internship Programs (Self sponsored) for professionals who want to have hands on experience. We are providing this program in alliance with IT Companies in COEPD Hyderabad premises. This program is dedicated to our unwavering participants predominantly acknowledging and appreciating the fact that they are on the path of making a career in Data Science discipline. This internship is designed to ensure that in addition to gaining the requisite theoretical knowledge, the readers gain sufficient hands-on practice and practical know-how to master the nitty-gritty of the Data Science profession. More than a training institute, COEPD today stands differentiated as a mission to help you "Build your dream career" - COEPD way.

Coepd said...

We at COEPD provides finest Data Science and R-Language courses in Hyderabad. Your search to learn Data Science ends here at COEPD. Here, we are an established training institute who have trained more than 10,000 participants in all streams. We will help you to convert your passion to learn into an enriched learning process. We will accelerate your career in data science by mastering concepts of Data Management, Statistics, Machine Learning and Big Data.

sumathi s said...

It has been simply incredibly generous with you to provide openly what exactly many individuals would’ve marketed for an eBook to end up making some cash for their end, primarily given that you could have tried it in the event you wanted.
fire and safety courses in chennai

ibss said...

Great article with excellent idea!Thank you for such a valuable article
Web design company in chennai
Web development company in chennai

stella murugesan said...

I was very interested in the article , it’s quite inspiring I should admit. I like visiting your site since I always come across interesting articles like this one. Keep sharing! Regards. Read more about Advanced Analytics

pakescorts646 said...

We are Provide you well-mannered and delightful Islamabad Escorts Females Who Belong to the Upper Strata Society they are trained to pleasure a man the exact way he wants her to without any Problem Best Escorts Services in Islamabad offer you to Spend a Quality time to get mentally relaxation Call us for any kind of information about our Services.

Ruby Gracie said...

I was very interested in the article, it’s quite inspiring I should admit. I like visiting your site since I always come across interesting articles like this one. Keep sharing! Regards. Read more about Big data Services

Dharani M said...

Good information thank you sharing this information
data science training in Marathahalli

best data science courses in Marathahalli

data science institute in Marathahalli

data science certification Marathahalli

data analytics training in Marathahalli

data science training institute in Marathahalli

asha said...

Nice Post...... Thanks for sharing this post
data science training in bangalore

best data science courses in bangalore

data science institute in bangalore

data science certification bangalore

data analytics training in bangalore

data science training institute in bangalore

mounika said...

Nice post..

data science training in BTM

best data science courses in BTM

data science institute in BTM

data science certification BTM

data analytics training in BTM

data science training institute in BTM

kritika mytectra said...

myTectra the Market Leader in Machine Learning Training in Bangalore
myTectra offers Machine Learning Training in Bangalore using Class Room. myTectra offers Live Online Machine Learning Training Globally. Read More

Unknown said...

Let me help you find the best Digital Transformation Software .

jenifer irene said...

It was really an interesting blog, Thank you for providing unknown facts.
Aviation Courses in Chennai
Air Hostess Training Institute in Chennai
airport courses in Chennai
airport ground staff training courses in Chennai
medical coding course in Chennai
fashion technology courses in Chennai
Interior design courses in Chennai

NettechIndia said...

this blog provided a helpful information.I hope that you will post more updates like this.
python training in Mumbai

nivedhitha reddy said...

very nice article Leading data science training in ameerpet

Ogen Infosystem said...

Thank you so much for sharing this informative blog with us. Visit Ogen Infosystem for Website Designing Services.
Website Designing Company in Delhi

Mobile App Development Company said...

Thanks for sharing such a great blog... I am impressed with you taking time to post a nice info.
Mobile App Development Company

sachindigitalplanner said...

Rice Bags Manufacturers
Pouch Manufacturers
wall putty bag manufacturers

sachindigitalplanner said...

we have provide the best ppc service.
ppc services in gurgaon
website designing company in Gurgaon
PPC company in Noida
PPC Company in Delhi

sachindigitalplanner said...

we have provide the best fridge repair service.
fridge repair in faridabad
LG Fridge Repair in Faridabad
Videocon Fridge Repair in Faridabad
Whirlpool Fridge Repair in Faridabad
LG Refrigerator Repair In Faridabad
Washing Machine Repair Center in Noida

sachindigitalplanner said...

Bali Honeymoon Packages From Delhi
Bali Honeymoon Packages From Chennai
Hong Kong Packages From Delhi
Europe Packages from Delhi
Bali Honeymoon Packages From Bangalore
Bali Honeymoon Packages From Mumbai
Maldives Honeymoon Packages From Bangalore

nivedhitha reddy said...

Great information Top data science institute in ameerpet