Tuesday, December 29, 2009

2009 Retrospective

I was thinking about top data mining trends in 2009, and searched for what others thought about it. I'll combine a few 2009 "top 3" lists here, including top trends (as described at Enterprise Regulars here), and posts here that generated the most buzz.

First, the top data mining news story was IBM's purchase of SPSS. It will be very interesting to see if this continues the trend toward integration of Business Intelligence and Predictive Analytics that one sees with SAS, Tibco and now IBM/SPSS.

The Enterprise Regulars post included a few interesting 2010 trends (but since data mining is all about using historical data to make predictions of future behavior, assuming past behavior will continue). In particular, there are 4 mentioned that were of interest to me:
  1. The holy grail of the predictive, real-time enterprise (his #2)
  2. SaaS / Cloud BI Tools will steal significant revenue from on-premise vendors but also fight for limited oxygen amongst themselves. (his #5)
  3. Advanced Visualization will continue to increase in depth and relevance to broader audiences. (his #7)
  4. Open Source offerings will continue to make in-roads against on-premise offerings. (his #8)
I agree with his #2 and #7 (integration of BI/PA and visualization). Several customers I work with are trying to integrate predictive analytics into the database to make better decisions. The difference now is that there is also interest in integrating this process with other data-centric (BI) operations to provide the right information to decision-makers with the right level of granularity (detail). This is typically a combination of creating the ability to perform ad hoc queries along with examining the results (rankings and projections) from predictive analytics.

However,I have not seen Cloud computing and Open source take off from the perspective of customers I work with. The latter two certainly have generated buzz, and in the courses I teach, there is considerable interest in open source computing (R in particular), but it has still be interest rather than action. I expect though that as the allure of data mining and predictive analytics extends its reach deeper into organizations, the need for inexpensive tools (in dollars) will result in increased use of the open source and free tools, such as R, RapidMiner, Weka, Tanagra, Orange, Knime, and others. Lastly, from this blog, the top posts of 2009 were
  1. Why normalization matters with K-Means
  2. How many software packages are too much?
  3. Data Mining: Does it get any better than this?
  4. Text Mining and Regular Expressions

Happy New Year!


Sandro Saitta said...

Thanks for this retrospective Dean! I'm looking forward to reading your next posts!

Dean Abbott said...

As of Jan 4, a reposting of this on the smart data collective (see link icon on this page to get there) is the #1 post with 732 views and 6 tweets (the next highest # views is 557).

I'm not sure what to think of this except that there is considerable cross-over in interests across the BI/PA worlds.