Tuesday, November 14, 2006

Ensembles everywhere

After reading the ensembles of cluster article recently, I just say another ariticle in IEEE PAMI entitled "Rotation Forest: A New Classifier Ensemble Method". The approach is interesting: much like Random Forests (where the diversity of trees used in the ensemble are developed by both bootstrap sampling and random variable selection), there is a random selection of variables to use in trees. But the twist here (and the "rotation") is by using PCA on the random subset of candidate variables.

I'm sure there are near infinite ways to tweak the ideas of random record/variable selection, but once again the keys to the success of ensembles here and always are:
1) Diversity in information (i.e., data) the modeling algorithm sees.
2) Algorithms that are weak learners benefit from ensembles

Random Forests, it seems to me, works well because not only are trees coarse, blunt (and unstable) predictors, but they are greedy searches that can be fooled into going down a sub-optimal path. By constraining the splits to contain only some of the variables, the tree is forced out of it's greedy perspective to consider other ways to achieve the solution. This new algorithm does the same thing, with the twist of using PCA to develop linear projections of the original data (subsets to be more precise).

I think we'll be seing more and more variations on the same theme in the coming years.


Will Dwinnell said...

Researchers certainly seem to get something from ensembles. I myself published an article on random rotations of the input variables a few years ago. I have wondered why this ensemble idea hasn't caught on more in practice. Perhaps it is the computational burden?

Dean Abbott said...

Do you have a link to where you published? I'd love to see the difference between the two approaches.

I think one thing that becoming more clear from the proliferation of ensembles is that individual models (single models) leave information "on the table". The combination of NP-Complete problems, highly correlated fields, combinations of nominal and interval variables, and mismatched cost functions make the typical algorithm less than optimal.

Maybe one day I'll return to my early days when I developed nonlinear, guided random searches with custom cost functions. :)

Will Dwinnell said...

The article I mentioned is Input Rotation for Enhancing Machine Learning Solutions, which was published in the Nov/Dec, 1999 issue of PC AI magazine. The basic idea was to average together multiple models, each built on a random rotation of the training data set.