Friday, April 04, 2008

Data modeling infrastructure in data mining

I've had two inquiries in the last day relating to the building of data infrastructure between the database and predictive modeling tool, which I find to be an interesting coincidence. I hadn't even thought about a need here before (perhaps because I wasn't aware of the vendors that address this issue), but am curious if others have thought through this issue/problem.

I have seen situations where the analyst and DBA need to coordinate, but due to the politics or personalities in an organization, do not. In these cases, a data miner may need tables that actually exist, but the miner doesn't have permission to access the tables, or perhaps doesn't have the expertise to know how to join all the requisite tables. In these cases, I can imagine this middleware if you will could be quite useful if it were more user-friendly. However, I'm not yet convinced this a real issue for most organizations.

Any thoughts?

2 comments:

Shane said...

For large organisations I have seen the analytics data mart approach work quite well. What I mean here is a centralised database that regularly pulls all the relevant data together to one place where the analysts can build models and play. When the models are ready to be deployed in production they can then be converted to SQL and deployed accordingly.

Will Dwinnell said...

In my experience, most large organizations have significant communication or cooperation gaps between various entities, including between I.T. and users.

In situations where an established community of experts exists outside of the database team- report coders or business analysts with query programming skills, I think things are much easier for the data miner since similar code and structures can likely be used to feed the data mining process.

All organizations who store data presumably have someone who extracts data from that storage and makes some use of it. The more sophisticated this extraction and consumption process is, the more likely the data miner will find assistance.