Wednesday, December 19, 2012

6 Reasons You Hired the Wrong Data Miner

As is in any discipline, talent within data mining community varies greatly.  Generally, business people and others who hire and manage technical specialists like data miners are not themselves technical experts.  This makes it difficult to evaluate the performance of data miners, so this posting is a short list of possible deficiencies in a data miner's performance.  Hopefully, this will spare some heartache in the coming year.  Merry Christmas!


1. The data miner has little or no programming skill.

Most work environments require someone to extract and prepare the data.  The more of this process which the data miner can accomplish, the less her dependence on others.  Even in ideal situations with prepared analytical data tables, the data miner who can program can wring more from the data than her counterpart who cannot (think: data transformations, re-coding, etc.).  Likewise, when her predictive model is to be deployed in a production system, it helps if the data miner can provide code as near to finished as possible.


2. The data miner is unable to communicate effectively with non-data miners.

Life is not all statistics: Data mining results must be communicated to colleagues with little or no background in math.  If other people do not understand the analysis, they will not appreciate its significance and are unlikely to act on it.  The data miner who can express himself clearly to a variety of audiences (internal customers, management, regulators, the press, etc.) is of greater value to the organization than his counterpart who cannot.  The data miner should should receive questions eagerly.


3. The data miner never does anything new.

If the data miner always approaches new problems with the same solution, something is wrong.  She should be, at least occasionally, suggesting new techniques or ways of looking at problems.  This does not require that new ideas be fancy: Much useful work can be done with basic summary statistics.  It is the way they are applied that matters.


4. The data miner cannot explain what they've done.

Data mining is a subtle craft: there are many pitfalls and important aspects of statistics and probability are counter-intuitive.  Nonetheless, the data miner who cannot provide at least a glimpse into the specifics of what they've done and why, is not doing all he might for the organization.  Managers want to understand why so many observations are needed for analysis (after all, they pay for those observations), and the data miner should be able to provide some justification for his decisions.


5. The data miner does not establish the practical benefit of his work.

A data miner who cannot connect the numbers to reality is working in a vacuum and is not helping her manager (team, company, etc.) to assess or utilize her work product.  Likewise, there's a good chance that she is pursuing technical targets rather than practical ones.  Improving p-values, accuracy, AUC, etc. may or may not improve profit (retention, market share, etc.).


6. The data miner never challenges you.

The data miner has a unique view of the organization and its environment.  The data miner works on a landscape of data which few of his coworkers ever see, and he is less likely to be blinded by industry prejudices.  It is improbable that he will agree with his colleagues 100% of the time.  If the data miner never challenges assumptions (business practices, conclusions, etc.), then something is wrong.