Saturday, November 22, 2008

What is Predictive Analytics?

I just saw this link about the difference between BI and Predictive Analytics. This comes on the heels of a meeting I had with UCSD Extension folks, talking about predictive analytics and data mining in the context of teaching courses for professionals, and this topic came up: how is predictive analytics different from BI?

First, I'd like to applaud the author, Vladimir Stojanovski, for concluding there are differences, and for trying to get at what those differences are.

The article states that this:

To tie this all back to the question of BI vs. Predictive Analytics (PA), a metaphor I've heard used to describe the difference goes something like this: if BI is a look in the rearview mirror, predictive analytics is the view out the windshield.


In my experience, this is a common definition. Predictive Analytics and Data Mining are seen as predicting future events, whereas OLAP looks at past data.

While I'd love to jump on this bandwagon because it makes for a simple and compelling story, I cannot ride this one. And that's because both BI and PA look at historic data. PA isn't magic in coming up with predictions of the future. In fact, both BI and PA ultimately look at and use the same data (or variations of the same historic data). Both can predict the future, so long as the future is consistent with past, either in a static sense, or in a dynamic sense (by extrapolating past data into the future).

I think it is better to describe the difference in this way: BI reports on historical data based upon an analyst's perspective on which fields and statistics are interesting, whereas PA induces which fields, statistics and relationships are interesting from the data itself. I think it is the combinatorics, sifting, iterative nature of PA that gives it better predictive accuracy of the future (coupled with using business metrics to assess if the fields found truly are predictive or not).

So let's not oversell--what PA does is reason enough for it to be an integral part of any analytics or BI group.

11 comments:

Tim Manns said...

I recently read an article by 'the data warehouse institute' (tdwi), written by Wayne Eckerson approx 18 months ago.

A few things made me laugh, but I can see some truth in it, and I quite liked the paper on the whole. It says that predictive analytics *is* data mining re-badged because too many people were claiming to do data mining and weren't. Here's a quote;
"Predictive analtyics has been around for a long time but has been known by other names. For much of the past 10 years, most people in commercial industry have used the term "data mining" to describe the technigues and processes involved in creating predictive models."

The paper also suggests that BI tools are best referred to as *deductive* in nature (users already have an idea or data structure imposed on them), whilst predictive analytics is *inductive* (explore all the data freely).

The analogy used in the paper, which I quite liked, was that BI tools present users with de facto set of hypotheses in the form of metrics and KPI's that users examine in various levels of depth. Conversely "predictive analytics is like an intelligent robot that rummages through all your data until it finds something interesting to show you."

Personally I like to refer to 'BI tools' and simple reports and packaged OLAP cubes. Anything more complicated than this is usually called 'data mining'. There is the element of skills, time and effort required in this definition. PA is usually harder.
That's a rather simplicist view, but most marketing non-analyst collegues I work with couldn't give a damn :) They outnumber me and just want reports, better response rates, or lower churn...

- Tim

Dean Abbott said...

I'd love to read the article; from your quotes, I agree wholeheartedly. First, I use Predictive Analytics and data mining interchangeably, though PA more with CRM-types. I think it is true that data mining has been misused sometimes by BI folks ("drilling down into data"). But I think there is another side to it as well--privacy. Data mining has been associated with all kinds of privacy abuses (I've posted on this topic before). PA is not (yet) tainted in this way.

I especially like the deductive vs. inductive distinctive, as opposed to retrospective vs. future.

Thanks for the post!

Anonymous said...

I've always thought of BI as purely an IT function; as an implementation of a data mining problem defined by a business stakeholder. So, the difference may be due to not what you call it, but to whom will be doing the actual implementation.

This is similar to the debate on the difference between data mining and statistics. Same purpose, different implementations.

As to some actual definitions of "Business Intelligence", there are some references to it being coined by a former analyst from the Gartner Group.

Tim Manns said...

here's the article on Teradata's site;

http://www.teradata.com/t/page/163384/index.html

Cheers

Tim

Sandro Saitta said...

Very interesting discussion.

I find the definition of Vladimir Stojanovski to be incorrect. As Dean mentioned, it is difficult to make the comparison based on the "past" and "future" aspect.

Also, to my knowledge, some business intelligence specialists consider data mining as being a sub-domain of BI.

@Ralph: I definitely agree with your comment.

@Tim: Thanks for the link!

Eric Siegel said...
This comment has been removed by the author.
Eric Siegel said...

I define predictive analytics as business intelligent technology that produces a predictive score for each customer. This distinguishes from forecasting.

However, this doesn't distinguish from a hand-written model (such as certain credit scoring methods). A more complete/strict definition would have to include that predictive modeling is applied over historical data.

The "predictive power" comes in the discovery of a model (or even just a segmentation scheme, in the case of a decision tree) that is optimized specifically for the predictive goal at hand.

Thoughts?

Will Dwinnell said...

Some interesting related to this post are at:

Statistics vs. Machine Learning, fight!

Anonymous said...

really nice blog and good collection of information on datamining it helped me lot in my project..

please visit my blog and leave ur valuable suggestions and comments..

My Blog:

techmk22.blogspot.com

Anonymous said...

I really liked the discussion you guys had, I am in a dilemma which I think you might be able to help me clear. I have done Master's in Operations Research with basic knowledge of both BI and PA. I have done Advanced SAS certification. I am now confused between which one I should go ahead with BI or PA.

Looking forward to your reply.

Dean Abbott said...

Anonymous:
I see BI as primarily reporting in nature, so if you enjoy summarizing information and presentation of information, perhaps BI is the better fit.

PA is inherently forensic in nature; you are trying to identify interesting patterns in data, but these are often not obvious, or can be confounded by the data you are building the models from.

I would expect that many jobs actually contain elements of both, so you don't necessarily have to "choose" between the two.