tag:blogger.com,1999:blog-5652924.post3537269957924855695..comments2019-07-23T05:05:07.118-07:00Comments on Data Mining and <br>Predictive Analytics: Comparison of Algorithms at PAKDD2007Dean Abbotthttp://www.blogger.com/profile/16818000233889520746noreply@blogger.comBlogger11125tag:blogger.com,1999:blog-5652924.post-81453288904705051042007-07-02T23:28:00.000-07:002007-07-02T23:28:00.000-07:00I've eventually gotten around to documenting some ...I've eventually gotten around to documenting some further findings on this data<BR/><BR/>http://www.tiberius.biz/pakdd07.htmlphilwww.tiberius.biznoreply@blogger.comtag:blogger.com,1999:blog-5652924.post-46512521752210587002007-05-21T01:15:00.000-07:002007-05-21T01:15:00.000-07:00Respected Will,Thanks for your comment to my formu...Respected Will,<BR/>Thanks for your comment to my formulations.<BR/>I agree with your thoughts.<BR/>But for me it is interesting to analyze the data which contain complex patterns.<BR/>For example, I meet often such situations at processing of medical data.<BR/>On similar data are very brightly highlighted strong and weaknesses of various algorithms as here we deal with full scale combinatory Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5652924.post-51267634254681977972007-05-20T15:44:00.000-07:002007-05-20T15:44:00.000-07:00The last commenter makes some interesting points. ...The last commenter makes some interesting points. In reflecting on them, I have two further thoughts:<BR/><BR/>1. It is absolutely amazing how many competitors continue to focus on apparent ("in-sample") model performance, only to see their scores crash when assessed on test data. One would have thought we'd be past this by now, but I suppose there is still a large fraction of analysts who "Will Dwinnellhttps://www.blogger.com/profile/03379859054257561952noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-9119052393994704752007-05-18T07:26:00.000-07:002007-05-18T07:26:00.000-07:00I think, that the secret of competitive data is si...I think, that the secret of competitive data is simple.<BR/>In these data there are no strong patterns, characteristic for different classes.<BR/>Therefore competition win rather "rough" methods.<BR/>Those participants who understood, that it is necessary to do accent not on accuracy of the decision of a competitive problem, but on stability of this decision, have won.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5652924.post-26592479268892922022007-05-09T14:36:00.000-07:002007-05-09T14:36:00.000-07:00For this data set I don't think the algorithm is t...For this data set I don't think the algorithm is that important as there is no real complicated equation involved to get a good solution. In our paper 083 we give some very simple rules of who to target and who not to target. I think the experience of the model builder in knowing some tricks of the trade is just as important.<BR/><BR/>What we did see was that the final results of most submissionsAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-5652924.post-75388676311276006862007-05-09T10:55:00.000-07:002007-05-09T10:55:00.000-07:00Phil's comments have prompted me to think about do...<I>Phil's comments have prompted me to think about doing a meta analysis of the results myself.</I><BR/><BR/>Yes, please do! I, for one, would be very interested.Will Dwinnellhttps://www.blogger.com/profile/03379859054257561952noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-40407358549669842772007-05-09T08:03:00.000-07:002007-05-09T08:03:00.000-07:00I think that one of the most interesting treatment...I think that one of the most interesting treatments of the "algorithm shootout" was done in the 90s as part of the statlog project and captured in a book entitled "Machine Learning, Neural and Statistical Classification" ed. by Michie, Spiegelhalter and Taylor. It's available online in PDF format <A HREF="http://www.amsta.leeds.ac.uk/~charles/statlog/" REL="nofollow"> here</A>. Not only did they Dean Abbotthttps://www.blogger.com/profile/16818000233889520746noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-91339917058820718702007-05-09T05:42:00.000-07:002007-05-09T05:42:00.000-07:00It's always interesting to see the results of this...It's always interesting to see the results of this kind of "shoot-out". I find Sandro's comment astute, and I suppose there are a variety of biases in this sort of contests, such as who uses which tools versus who has time or incentive to compete.<BR/><BR/>A few years ago, Tjen-Sien Lim and colleagues undertook some fairly broad empirical analyses of machine learning algorithms:<BR/><BR/>http:/Will Dwinnellhttps://www.blogger.com/profile/03379859054257561952noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-84081442726752246402007-05-08T19:52:00.000-07:002007-05-08T19:52:00.000-07:00I've also been doing some analysis of the results ...I've also been doing some analysis of the results and have come to a similar conclusion that how ensembling is done is more important than the algorithm.<BR/><BR/>Findings to date..<BR/><BR/>1. An average ranking of the top 27 submissions would have won the competition.<BR/><BR/>2. An average ranking of the top 36 submissions would have come runner up.<BR/><BR/>3. Our summission would have jumpedPhilwww.tiberius.biznoreply@blogger.comtag:blogger.com,1999:blog-5652924.post-35493048863093983402007-05-08T09:19:00.000-07:002007-05-08T09:19:00.000-07:00You're right that the size of the sample here is t...You're right that the size of the sample here is too small to infer too much about particular algorithms (like SVMs). <BR/><BR/>What I think is most interesting is regardless of the algorithm used, it was the fact that an ensemble combination of the models using that algorithm(s) was used, whether the base algorithm was a tree, probit, Neural net, ...<BR/><BR/>Therefore, my conclusion is that theDean Abbotthttps://www.blogger.com/profile/16818000233889520746noreply@blogger.comtag:blogger.com,1999:blog-5652924.post-35696883990853885612007-05-08T03:34:00.000-07:002007-05-08T03:34:00.000-07:00Results are interesting, although it is difficult ...Results are interesting, although it is difficult to know if good results of algorithms depend on their efficiency or popularity. I'm also surprised not to see SVM in the top 20.Sandro Saittahttps://www.blogger.com/profile/17682082649770875583noreply@blogger.com