Monday, March 19, 2007

Document, Document, Document!

I recently came across a cautionary list of "worst practices", penned by Dorian Pyle, titled This Way Failure Lies. No one likes filling out paperwork, but Dorian's rule 6 for disaster makes a good point:

Rule 6. Rely on memory. Most data mining projects are simple enough that you can hold most important details in your head. There's no need to waste time in documenting the steps you take. By far, the best approach is to keep pressing the investigation forward as fast as possible. Should it be necessary to duplicate the investigation or, in the unlikely event that it's necessary to justify the results at some future time, duplicating the original investigation and recreating the line of reasoning you used will be easy and straightforward.


As opposed to purely point-and-click tools, data mining tools which include "visual programming" interfaces (Insightful Miner, KNIME, Orange) or programming languages (Fortran, C++, MATLAB) allow a certain amount of self-documentation. Unless commenting is extremely thorough, though, it is probably worth producing at least some sort of summary document, which will need to explain the purpose and basic structure of the models. As analysis indicates adjustments in your course, this document should be updated accordingly.

No comments: