Sunday, November 05, 2006

Keep Your Eye On The Problem

An occupational hazard of statistical modeling is to view the model as the solution to the problem. Direct application of modeling in an effort to leap directly from the available data to a final result is not always feasible or efficient. Models are always part of a solution, and guaging how large a role they play in the solution is important.

Consider the solution described in the paper, Fuzzy Decision System for Threshold Selection to Cluster Cauliflower Plant Blobs From Field Visual Images, by L. García-Pérez, M. C. García-Alegre, J. Marchant and T. Hague, which involves an agricultural machine vision problem. One fundamental step in the solution of this particular problem is segmentation of the digitized image into two classes: plants and background. In this case, the authors have elected to employ a single threshold of brightness to be applied to the entire image, above which pixels will be regarded as plants, and below which pixels will be regarded as background.

A complete solution to this problem accepts image data as input and produces a suitable threshold value as output. Having gotten in the habit of viewing the model as the solution, it is easy for the modeler to prematurely conclude that his or her model should, by itself, bridge this entire gap.

In the paper mentioned above, however, the authors took a different perspective. They constructed a model (a "policy", really- in this case composed of fuzzy logic rules) which indicates only by how much a given threshold should be incremented or decremented to improve performance. The threshold is initialized and updated by iteratively firing the model to modify the threshold. This is essentially a search across candidate threshold values. When the threshold value stabilizes (when the model indicates zero suggested change in the threshold), the process is finished and the final threshold is output. Note that the model does not directly determine the threshold. This model accepts image data and a candidate threshold as input and produces a suggested change for the threshold value as output. The total solution, which does accept image data as input and produces a suitable threshold value as output is composed of the fuzzy logic model as well as a simple looping search mechanism.

I suggest that it is worth trying to find the best way to leverage the power of modeling, whether as a complete solution unto itself or otherwise. Re-arranging what, exactly, is treated as model input and model output is one way to accomplish this.


MineThatData said...

Good luck as you fire-up this blog, I hope things go well for you!

Kevin Hillstrom

Dean Abbott said...

Didn't Einstein say that a theory should be as simple as possible, but no simpler? I heartily agree, and think it applies to analytics as well.