Tuesday, September 11, 2012

What do we call what we do?

I've called myself a data miner for about 15 years, and the field I was a part of as Data Mining (DM). Before then, I referred to what I did as "Pattern Recognition", "Machine Learning", "Statistical Modeling", or "Statistical Learning". In recent years, I've called what I do Predictive Analytics (PA) more often and even co-titled my blog with both Data Mining and Predictive Analytics. That stated, I don't have a good noun to go along with PA. A "predictive analytist" (as if I myself were a "predictor")? A "predictive analyzer"? I often call someone who does PA a Predictive Analytics Professional. But the according to google, the trending on data mining is down. Pattern recognition? Down. Machine Learning? Flat or slightly up. Only Predictive Analytics and it's closely-related sibling, Business Analytics, are up. Even the much-touted Data Science has been relatively flat, though has been spiking Q4 the past few years.
data mining
Data Mining
Pattern Recognition
Machine Learning
Predictive Analytics
Business Analytics
The big winner? Big Data of course! It has exploded this year. Will that trend continue? It's hard to believe it will continue, but this wave has grown and it seems that every conference related to analytics or databases is touting "big data".

Big Data

Data Science

I have no plans of calling what I do "big data" or "data science". The former term will pass when data gets bigger than big data. The latter may or may not stick, but seems to resonate more with theoreticians and leading-edge types than with practitioners. For now, I'll continue to call myself a data miner and what I do predictive analytics or data mining.


Will Dwinnell said...

Unfortunately, most of these terms somewhat suggest specific applications: "data mining" suggests the use of large data sets, while "pattern recognition" is used more in engineering applications, such as signal or image classification tasks.

Naturally, there's also a marketing impetus to come up with new terms. I notice that "big data" and "data science" have recently emerged.

Perhaps, Dean, you and I should market a new terms to put ourselves on the cutting edge? I suggest "extreme analytics" (XA).

By the way, I might have been the person who added "pattern recognition" to some of those Wikipedia entries.

Dean Abbott said...

I love XA. Would we be required to type Matlab code while bungee jumping?

Will Dwinnell said...

I was thinking of something like a "Wargames"-style room with visualizations on those giant screens.

Gregory Piatetsky said...

Dean, agree that BigData term will pass, but Data Mining term has negative "invasion-of-privacy" connotation in popular press, so I am OK with data science. "Data science" is not a science now, but it can be

Dean Abbott said...

Gregory: you are right about data mining and the negative connotations. To many people, I describe what I do in terms like "Predictive Analytics", though there is no good noun form yet. Data Mining evokes a nice image to me, but has the baggage.

You've heard of John Elders jokes about Data Science and Computer Science, I'm sure (the "Science" part is lacking); I tend to agree. There sometimes is science in data science, but not always because we're more interested in solutions than the scientific method. Do you call yourself a data scientist colloquially?

Meta Brown said...

The terminology for what we do is ever-changing, in part because whatever terms we use are quickly adopted by vendors and others who see a market opportunity and want in on the action (real or perceived). As long as there are no meaningful professional credential for data analysts, this will continue.

We could stand to take a lesson from actuaries, whose profession is much like ours, in terms of the technical skills required and the type of information they provide. They have meaningful professional certification, defined by practitioners, rather than vendors. Those certifications have real value to employers, and qualified members of the profession enjoy good job stability and compensation.

Sandro Saitta said...

Very interesting discussion!

I use the term data mining, although I agree about the negative connotation it can have for non-experts.

What I don't like about Big Data is that there is no action (unlike data mining) or no goal (unlike predictive analytics). It's only about what we mine: the data.

The only problem I have with predictive analytics is that it doesn't include descriptive approaches. When we explain data using correlation or clustering, there is no prediction.

This is why, if I have to choose something else than data mining, it would be business analytics. I like this terms since it somehow includes the "from analytics to action" part which is critical for a successful data mining project.

Will Dwinnell said...

The current (Oct-2012) issue of "Harvard Business Review" features three articles on "big data".

Dean Abbott said...

I saw that too--very interesting. The "Big Data: The Management Revolution" has an interesting conclusion: http://hbr.org/2012/10/big-data-the-management-revolution/ar/1

"The evidence is clear: Data-driven decisions tend to be better decisions. Leaders will either embrace this fact or be replaced by others who do. In sector after sector, companies that figure out how to combine domain expertise with data science will pull away from their rivals. We can’t say that all the winners will be harnessing big data to transform decision making. But the data tell us that’s the surest bet."

Will Dwinnell said...

I was tempted to quantify the changes in some of these values (as by the exponent of a power curve fit), but if one restricts one's attention to the last 12 months, Google Trends for many of these terms ("data mining", "pattern recognition" and "machine learning")are relatively flat, or wander enough that a clear trend is not apparent. For the same time-frame, newer terms ("big data", "data science" and "data scientist") are clearly on the rise.