
At the link in the title are the results from the 11th Pacific-Asia Knowledge Discovery and Data Mining conference (PAKDD 2007). The dataset for the competition was a cross-seller dataset, but that is not of interest to me here. The interesting this is these: which algorithms did the best, and were they significantly different in their performance?
A note about the image in the post: I took the results, sorted by area under the ROC curve (AUC). The results already had color coded the groups of results (into winners, top 10 and top 20)--I changed the colors to make them more legible. I also added red-bold text to the algorithm implementations that included an ensemble (note that after I did this, I discovered that the winning Probit model was also an ensemble).
And, for those who don't want to look at the image, the top four models were as follows:
AUC.......Rank...Modeling Technique
70.01%.....1.....TreeNet + Logistic Regression
69.99%.....2.....Probit Regression
69.62%.....3.....MLP + n-Tuple Classifier
69.61%.....4.....TreeNet
First, note that all four winners used ensembles, but ensembles of 3 different algorithms: Trees (Treenet), neural networks, and probits. The differences between these results are quite small (arguably not significant, but more testing would have to take place to show this). The conclusion I draw from this then is that the ensemble is more important than the algorithm; so long as there are good predictors, variation in data used to build the models, and sufficient diversity in predictions issued by the individual models.
I have not yet looked at the individual results to see how much preprocessing was necessary for each of the techniques, however I suspect that less was needed for the TreeNet models just because of the inherent characteristics of CART-styled trees in handling missing data, outliers, and categorical/numeric data.
Second, and related to the first, is this: while I still argue that generally speaking, trees are less accurate than neural networks or SVMs, ensembles level the playing field. What surprised me the most was that logistic regression or Probit ensembles performed as well as they did. This wasn't because of the algorithm, but rather because I haven't yet been convinced that Probits or Logits consistently work well in ensembles. This is more evidence that they do (though I need to read further how they were constructed to be able to comment on why they did as well as they did).