tag:blogger.com,1999:blog-5652924.post755054240215275429..comments2024-03-02T01:02:21.655-08:00Comments on Applied Data Science and <br>Machine Learning: Why Overfitting is More Dangerous than Just Poor Accuracy, Part IDean Abbotthttp://www.blogger.com/profile/16818000233889520746noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-5652924.post-47090008846807623022016-04-18T07:03:54.536-07:002016-04-18T07:03:54.536-07:00thanks for your comments. You are correct that the...thanks for your comments. You are correct that there is a bit of hyperbole going on with the title. The "dangerous" label would only be the case if the model is used, of course. <br /><br />What I'm most uncomfortable with in this post is how to detect the problems. Yes, there are obvious visual cues and yes we can examine training/testing accuracy metrics (for consistency...but Dean Abbotthttps://www.blogger.com/profile/16818000233889520746noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-12333913568549783312016-04-18T06:37:25.969-07:002016-04-18T06:37:25.969-07:00Your title drew me in. I certainly agree that over...Your title drew me in. I certainly agree that overfitting is more dangerous than poor accuracy. I would also suggest that poor accuracy isn't very dangerous, making your assertion not terribly surprising (not to say it isn't a valid point, of course). If you create a model (or your learning algorithm does it for you) and the model performs poorly, you know it performs poorly up-front. pickettbdhttps://www.blogger.com/profile/09458850380115200278noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-70244970724046325142014-05-27T09:10:20.692-07:002014-05-27T09:10:20.692-07:00I agree with you that the models are only applicab...I agree with you that the models are only applicable to where the data was during training. Finding the gaps/empty areas in the decision space can be difficult though. It's easy to test model inputs and if all the inputs exceed their max value, you know the model has to extrapolate. <br /><br />But if some of the inputs exceed and others don't, the data could still be in a good location. Dean Abbotthttps://www.blogger.com/profile/16818000233889520746noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-41642112881253631602014-05-27T02:14:30.548-07:002014-05-27T02:14:30.548-07:00I would suggest that if you wish to classify a rec...I would suggest that if you wish to classify a record that appears in the top left of the first figure you cannot use either of the two models shown. The model is only relevant to the data on which it has been built. Once the data you wish to classify is out of this range then the model is no longer valid.Anonymousnoreply@blogger.com