tag:blogger.com,1999:blog-5652924.post2222187176824878944..comments2024-03-28T23:07:52.202-07:00Comments on Applied Data Science and <br>Machine Learning: To Graph Or Not To GraphDean Abbotthttp://www.blogger.com/profile/16818000233889520746noreply@blogger.comBlogger10125tag:blogger.com,1999:blog-5652924.post-84616679633733913342007-06-29T10:44:00.000-07:002007-06-29T10:44:00.000-07:00I don't know about you, but the most common techni...<I>I don't know about you, but the most common technique I use for examining residuals is to graph them. Getting a report on the average error or maximum error will help, but won't tell me enough to understand how to fix the problem. Do you have a favorite test here?</I><BR/><BR/>Let me clarify: this idea about graphs was suggested to me. I do not wholly endorse it, though I think it has some Will Dwinnellhttps://www.blogger.com/profile/03379859054257561952noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-6503457890255582332007-06-28T11:31:00.000-07:002007-06-28T11:31:00.000-07:00Hi,On the question of "to draw conclusions based o...Hi,<BR/><BR/>On the question of "to draw conclusions based on numeric measures (which may or may not, strictly speaking, be statistical "tests"), rather than on visual interpretation of graphs" I would avoid (possibly) subjective interpretations where they can be replaced with a more precise interpretation.<BR/><BR/>I don't think it’s really about whether graphics are useful but more so about theAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-5652924.post-7580159671165493692007-06-28T06:51:00.000-07:002007-06-28T06:51:00.000-07:00"If the objective was to model the data, then I do..."If the objective was to model the data, then I don't understand, for instance, why the analyst would choose a linear model for the data set with the obviously quadratic relationship (note that this would have been obvious from regression diagnostics)."<BR/><BR/>That is exacatly what I'm driving at. Before you model, you have no idea what form of the model is appropriate. Afterward, yes examiningDean Abbotthttps://www.blogger.com/profile/16818000233889520746noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-21798516400124541312007-06-28T05:48:00.000-07:002007-06-28T05:48:00.000-07:00All four data sets in the quartet have the identic...<I>All four data sets in the quartet have the identical summary statistics--same y vs. x correlation, same R^2 in the linear fit, same MSE, etc.</I><BR/><BR/>First, we must admit the extremely artificial nature of this data set, given its small exemplar count and very small variable count.<BR/><BR/>If the objective is to distinguish the four data sets, then I'd say this selection of summary Will Dwinnellhttps://www.blogger.com/profile/03379859054257561952noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-33044719836489345292007-06-27T08:12:00.000-07:002007-06-27T08:12:00.000-07:00Good points, especially about the number of dimens...Good points, especially about the number of dimensions that can be sensibly graphed. Graphing multivariate data sets is a visual challenge.<BR/><BR/>The work that I'm doing now (in a business environment) is a combination of exploration (like EDA) and statistical tests. When I find something that appears germane to the business problem, I usually provide the graph and the statistic.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5652924.post-40453550363719252392007-06-27T06:43:00.000-07:002007-06-27T06:43:00.000-07:00anonymous, I think you make a good point, but it's...<I>anonymous</I>, I think you make a good point, but it's worth noting the difference between <I>presenting</I> data, and making a decision based on it. I absolutely agree that graphs are useful when dealing with managers and other non-technical folks.<BR/><BR/>However, I think the fellow I mentioned makes a good point, too, regarding the use of measurements instead of graphs as a basis for Will Dwinnellhttps://www.blogger.com/profile/03379859054257561952noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-52649904477783486132007-06-27T04:07:00.000-07:002007-06-27T04:07:00.000-07:00Hi I just found your site the other day, and here'...Hi I just found your site the other day, and here's my two cents on graphs: <BR/><BR/>A la Tufte, graphs do reveal data in a very concrete way. They allow you to present a lot of data at the level of the observation, as opposed to category or entire data set. Boxplots, histograms, or scatterplots, of course, could show outliers in the IRS data. <BR/><BR/>I think graphs are especially Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5652924.post-80060323014967093142007-06-25T18:09:00.000-07:002007-06-25T18:09:00.000-07:00I need to post the original data and the Ascombe's...I need to post the original data and the Ascombe's quartet....<BR/><BR/>But Tufte's point is this:<BR/>All four data sets in the quartet have the identical summary statistics--same y vs. x correlation, same R^2 in the linear fit, same MSE, etc. These are the kind of statistics you might use to examine different variables in your data. Moreover, you could view this as data having a common "X", andDean Abbotthttps://www.blogger.com/profile/16818000233889520746noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-75596741996619554092007-06-25T17:50:00.000-07:002007-06-25T17:50:00.000-07:00I think that the idea wasn't so much to replace a ...I think that the idea wasn't so much to replace a graph with a single statistical test, but rather to draw conclusions based on numeric measures (which may or may not, strictly speaking, be statistical "tests"), rather than on visual interpretation of graphs.<BR/><BR/>I agree with Tufte that a table which merely dumps the raw data of Anscombe's quartet is not terribly meaningful. The question, Will Dwinnellhttps://www.blogger.com/profile/03379859054257561952noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-75017295379908498522007-06-25T10:47:00.000-07:002007-06-25T10:47:00.000-07:00I'm not sure which statistical test your colleague...I'm not sure which statistical test your colleague has in mind. A good example of the limitations of statistical tests to convey information is the famous Anscombe's Quartet data set. In his excellent book <A HREF="http://www.edwardtufte.com/tufte/books_vdqi" REL="nofollow"> The Visual Display of Quantitative Information</A>, Edward Tufte states the following:<BR/><BR/>"Graphics <I>reveal</I> Dean Abbotthttps://www.blogger.com/profile/16818000233889520746noreply@blogger.com