I'll never forget the hubbub that was generated by a talk by Pedros Domingos, I think at the 1998 KDD conference in New York City. His talk, "Occam's Two Razors: The Sharp and the Blunt" turned the usually calm and measured data mining crowd into an agitated hoard! I mean there was even yelling there (and if you've never been to a conference of statisticians or data miner, you won't know how unbelievable this scene was. This is from the Abstract, also posted at Citeseer in the link above:
Occam's razor has been the subject of much controversy. This paper argues
that this is partly because it has been interpreted in two quite different ways,
the first of which (simplicity is a goal in itself) is essentially correct,
while the second (simplicity leads to greater accuracy) is not. The paper
reviews the large variety of theoretical arguments and empirical evidence for
and against the "second razor," and concludes that the balance is strongly
John Elder provided a very useful and reasoned explanation for the apparent difficulties with Occam's Razor here (in a summary of KDD '98 presented at my favorite conference, the Symposium on the Interface). To quote,
Jianming Ye presented an intriguing alternative metric, Generalized
Degrees of Freedom, which finds the sensitivity of the fitted values to
perturbations in the outputs . This allows complexity to be measured
and compared in an entirely experimental manner [Ye, J. (March 1998). On Measuring and Correcting the Effects of Data Mining and Model Selection. Journal of the American Statistical Association 93, no. 441, pp. 120-131.]
The key idea is this: just because a model has lots of weights or splits, doesn't mean it is acting in a complex way. John has shown subsequently (at the most recent Salford Systems conference in San Diego, March 2006) that using the GDoF idea, ensembles of trees can actually behave less complex than individual trees within the ensemble.
So I agree with the idea that simpler models are usually better, and for that reason, I always try linear models first. However, sometimes the best models in terms of accuracy are complex looking, but in reality behave in relatively simple ways.