Sunday, January 13, 2008

Data Mining: Interesting Ethical Questions

Data mining permits useful extrapolation from sometimes obscure clues. Information which human experts have ignored as irrelevant has been eagerly snapped up by data mining software. This leads to interesting ethical questions.

Consider the risk of selling an individual automobile insurance for one year. Many factors are related to this risk. Some are obvious, such as incidence of previous accidents, traffic violations or average number of miles driven per year. Other risk factors may not be so obvious, but are nonetheless real. Suppose that it could be shown statistically that, when added to information already in use, late payment of utility bills incrementally improved prediction.

One might take the perspective that this is a business of prediction, not explanation, so- whatever the connection- this information should be added to the insurance risk model. This perspective reasons: if the connection is statistically significant, however strange it may seem, we should conclude that it is real and it should be exploited for business purposes.

Obviously, there is a countervailing perspective which has the customer asking, "What the... ? What do my utility bills have to do with my car insurance?" Even extremely laissez-faire governments may intervene in markets and forsake economic efficiency in favor of other priorities. In the United States, for example, certain types of discrimination in lending is illegal.

Another thing to consider (again, granting that the utility bill-automobile risk connection is real) is that, in prohibiting the use of utility bill payments in auto insurance risk prediction implies that less risky customers will be paying for riskier customers.

Thoughts?