Sunday, March 06, 2011

Statistics: The Need for Integration

I'd like to revisit an issue we covered here, way back in 2007: Statistics: Why Do So Many Hate It?. Recent comments made to me, both in private conversation ("Statistics? I hated that class in college!"), and in print prompt me to reconsider this issue.

One thing which occurs to me is that many people have a tendency to think of statistics in an isolated way. This world view keeps statistics at bay, as something which is done separately from other business activities, and, importantly, which is done and understood only by the statisticians. This is very far from the ideal which I suggest, in which statistics (including data mining) are much more integrated with the business processes of which they are a part.

In my opinion, this is a strange way to frame statistics. As an analog, imagine if, when asked to produce a report, a business team turned to their "English guy", with the expectation that he did all the writing. I am not suggesting that everyone needs to do the heavy lifting that data miners do, but that people who don't accept some responsibility for data mining's contribution to the business process. Managers, for example, who throw up their hands with the excuse that "they are not numbers people" forfeit control over an important part of their business function. It is healthier for everyone involved, I submit, if statistics moves away from being a black art, and statisticians become less of an arcane priesthood.


Once Upon A Time said...

One reason that data mining not gaining their importance in the business management might be lack of technical advances. In pharmaceutical companies, Biostatisticians are the ones fully responsible for designing and writing the clinical trials' protocols, and running and monitoring the progress. In research institute, Statisticians who involve in a research project, are always the first author of a published paper. And they are also responsible for drafting grant proposals starting from how data to be gartered,statistical analysis,and etc. They sometimes even write their own codes, if the method does not yet exit in those major statistical softwares. Quant, in investment banks or hedge funds, obviously is a highly technical and important position, working on the fancy mathematics and stochastic processes. However, data miners in some industries are merely analysts for the business, using softwares to generate the results, which is nothing different from those doing the simple calculations using Excel. Data mining is a process, may not directly impact the company's business, may not be a mechanism to earn money for the company.

Dean Abbott said...

Once--thanks for your comments and insights. I'd like to give a different perspective on it though.

I am a data miner and not a statistician, so I'm coming from that camp, but I would say that if anything, the data miners have more of a business focus than statisticians. I have had many consulting engagements that originated because companies looking for solutions got sick of waiting for in-house statisticians to complete the models. These statisticians it seems were more interested in the analysis than the business value (there was a steep opportunity cost associated with time, so even if their models weren't "perfect", they still would have been valuable).

There are many data miners who no doubt are mere analysts, though I think the same can be said for entry-level statisticians who spend their time building and rebuilding the same regression models.

Your insights in the Pharma world I don't doubt at all; in part because of regulatory concerns, there has to be a theoretical basis for findings, which is why data is squeezed into the box where these tests apply. That's perfectly fine. But in the commercial business world (retail, telco as examples), I don't think statisticians are winning the day. The data miners are doing the leading-edge work here and not publishing results because companies are loathe to give competitors any inkling of what they are up to. (I had that experience in financial modeling where to this day, I still cannot reveal a customer or what we were doing on a project from the 90s).

Once Upon A Time said...

It is a difference between theoretical statistician and applied statistician. Also could be a difference between applied statistician and data analyst.

To theoretical statisticians, they are interested in mathematical theories, and inventing new methods to cope with different datasets and different tasks. Therefore the methodologies we are using in statistical practices or data mining, at least more than half were invented by Statisticians.

Data -> Preditive model is one area of applied statistics, and another one is Model <- Data, in which mathematicians or statisticians are building theoretical models first, say (compartment model, Markov jump process), then using data to calibrate the model for the parametrization, therefore the model can be used to project the future. Such a modelling practice is away from the data at the first stage.

As are the different areas of practices, I do appreciate both works done by data miners and Statisticians.

BlogFront said...

Hi Will, is committed to uphold the quality standards of blogging. We strive to maintain and promote only the most credible blogs in their respective fields.

Spam blogs or "splogs" has been a problem for some time now and people are getting confused about which blog to trust.

We would like to thank you for maintaining such a reputable blog. We know that it takes time, effort and commitment to keep such a blog and as such, we have added your blog as one of the top Data Mining Blogs.

You can see your blog listed here:

You can also claim your BlogFront Top Blogs badge at

Thank you for keeping your blog credible. Let's keep the blog revolution alive!

Maria Blanchard
Blog Revolucion