In a recent posting to this Web log (Data Mining and Privacy...again, Jan-04-2010), Dean Abbott made several points regarding the use of data mining to counter terrorism, and related privacy issues. I'd like to address the question of the usefulness of data mining in this application.
Dean quoted Bruce Schneier's argument against data mining's use in anti-terrorism programs. The specific technical argument that Schneier has made (and he is not alone in this) is: Automatic classification systems are unlikely to be effective at identifying individual terrorists, since terrorists are so rare. Schneier concludes that the rate of "false positives" could never be made low enough for such a system to work effectively.
As far as this specific technical line of thought goes, I agree absolutely, and doubt that any competent data analyst would disagree. It is the extension of this argument to the much broader conclusion that data mining is not a fruitful line of inquiry for those seeking to oppose terrorists that I take issue with.
Many (most?) computerized classification systems in practice output probabilities, as opposed to simple class predictions. Owners of such systems use them to prioritize their efforts (think of database marketers who sort name lists to find the so many who are most likely to respond to an offer). Classifiers need not be perfect to be useful, and portraying them as such is what I call the "Minority Report strawman".
Beyond this, data mining has been used to great effect in rooting out other criminal behaviors, such as money laundering, which are associated with terrorism. While those who practice our art against terrorism are unlikely to be forthcoming about their work, it is not difficult to imagine data mining systems other than classifiers being used in this struggle, such as analysis on networks of associates of terrorists.
It would take considerable naivety to believe that present computer systems could be trained to throw up red flags on a small number of individuals, previously unknown to be terrorists, with any serious degree of reliability. Given the other chores which data mining systems may perform in this fight, I think it is equally naive to abandon that promise for an overextended technical argument.