I was listening to Colin Cowherd of ESPN radio this morning and he made a very interesting observation that we data miners know, or at least should know and make good use of. The context was evaluating teams and programs: are they dynasties or built off of one great player or coach. Lakers? dynasty. Celtics? dynasty. Bulls? without Jordan, they have been a mediocre franchise. The Lakers without Magic are still a dynasty. The Celtics without Bird are still a dynasty.
So his rule of thumb that he applied to college football programs was this: remove the best coach and the worst coach, and then assess the program. If they are still a great program, they are truly a dynasty.
This is the trimmed (truncated) mean idea that he was applying intuitively but is quite valuable in practice. When we assess customer lifetime value, if a small percentage of the customers generate 95% of the profits, examining those outliers or the long tail while valuable does not get at the general trend. When I was analyzing IRS corporate tax returns, the correlation between two line items (that I won't identify here!) was more than 90% over the 30K+ returns. But when we removed the largest 50 corporations, the correlation between these line items dropped to under 30%. Why? Because the tail drove the relationship; the overall trend didn't apply to the entire population. It is easy to be fooled by summary statistics for this reason: they assume characteristics about the data that may not be true.
This all gets back to nonlinearity in the data: if outliers behave differently than the general population, assess them based on the truncated populations. If outliers exist in your data, get the gist from the trimmed mean or median to reduce the bias from the outliers. We know this intuitively, but sometimes we forget to do it and make misleading inferences.
[UPDATE] I neglected to reference a former post that shows the problem of outliers in computing correlation coefficients: Beware of Outliers in Computing Correlations.