Wednesday, January 06, 2010

Data Mining and Terrorism... Counterpoint

In a recent posting to this Web log (Data Mining and Privacy...again, Jan-04-2010), Dean Abbott made several points regarding the use of data mining to counter terrorism, and related privacy issues. I'd like to address the question of the usefulness of data mining in this application.

Dean quoted Bruce Schneier's argument against data mining's use in anti-terrorism programs. The specific technical argument that Schneier has made (and he is not alone in this) is: Automatic classification systems are unlikely to be effective at identifying individual terrorists, since terrorists are so rare. Schneier concludes that the rate of "false positives" could never be made low enough for such a system to work effectively.

As far as this specific technical line of thought goes, I agree absolutely, and doubt that any competent data analyst would disagree. It is the extension of this argument to the much broader conclusion that data mining is not a fruitful line of inquiry for those seeking to oppose terrorists that I take issue with.

Many (most?) computerized classification systems in practice output probabilities, as opposed to simple class predictions. Owners of such systems use them to prioritize their efforts (think of database marketers who sort name lists to find the so many who are most likely to respond to an offer). Classifiers need not be perfect to be useful, and portraying them as such is what I call the "Minority Report strawman".

Beyond this, data mining has been used to great effect in rooting out other criminal behaviors, such as money laundering, which are associated with terrorism. While those who practice our art against terrorism are unlikely to be forthcoming about their work, it is not difficult to imagine data mining systems other than classifiers being used in this struggle, such as analysis on networks of associates of terrorists.

It would take considerable naivety to believe that present computer systems could be trained to throw up red flags on a small number of individuals, previously unknown to be terrorists, with any serious degree of reliability. Given the other chores which data mining systems may perform in this fight, I think it is equally naive to abandon that promise for an overextended technical argument.

2 comments:

Tim Manns said...

Hi guys,

I read through the blog as quoted in the previous "Data Mining and Privacy...again" post
See here http://enlightenedlayperson.blogspot.com/2010/01/data-mining-needle-in-haystack-problem.html

To quote one part of it: "The Inspector General's Report was unable to quantify its usefulness to any degree, other than to say that Hayden vouched for its usefulness and said that it would have captured two of the 9-11 hijacker. But has it thwarted any actual terrorist attacks?"

- this made me laugh! I deal with exactly the same issue all the time. I can identify and predict telcom churn (customer's leaving) really well, but just how many can we prevent from leaving depends on far more than data mining. For example it depends upon the current market conditions, can they afford the product, competitor products, attractiveness of our customer retention offer etc etc.

In the discussion there seems to be a bit of a disconnect between the activity of data mining (success to *find* rare occurancies and insight) and the outcome of any *action* based upon the derived insight.

I completely believe that data mining is valuable in counter-terrorism and (for the most part) the benefits (saving lives etc) outweigh the privacy issues.

Personally I am far more concerned with the privacy issues involved in commerical analysis of 'public' sources of data (for example facebook and twitter); where a common reason for the analysis is simply to spam us or sell us more junk :)

Anonymous said...

Great insights and opinions. Experienced folks in this field understand what it means to be data rich but information poor. Mining the growing body of metadata may provide a significant improvement to being data rich/info poor. Lets break it down a bit. Someone already and correctly identified that mining won't produce the yes/no answer, especially against an infrequent target set. Perhaps "behavior" is the coin of the realm when it comes to data mining. Which has been used to great effect in rooting out other criminal behavior with a nexus to terrorism. So here's my question to the group: I'm looking for resources to perform just this type of mining. Anyone got any ideas?