Recently, much has written specifically about data mining's likely usefulness as a defense against terrorism. This posting takes "data mining" to mean a sophisticated and rigorous statistical analysis, and excludes data gathering functions. Privacy issues aside, claims have recently been made regarding data mining's technical capabilities as a tool in combating terrorism.
Very specific technical assertions have been made by other experts in this field, to the effect that predictive modeling is unlikely to provide a useful identification of individuals imminently carrying out physical attacks. The general reasoning has been that, despite the magnitude of their tragic handiwork, there have been too few positive instances for accurate model construction. As far as this specific assertion goes, I concur.
Unfortunately, this notion has somehow been expanded in the press, and in the on-line writings of authors who are not expert in this field. The much broader claim has been made that "data mining cannot help in the fight against terrorism because it does not work". Such overly general statements are demonstrably false. For example, a known significant component of international terrorism is its financing, notably through its use of money laundering, tax evasion and simple fraud. These financial crimes have been under attack by data mining for over 10 years.
Further, terrorist organizations, like other human organizations, involve human infrastructure. Behind the man actually conducting the attack stands a network of support personnel: handlers, trainers, planners and the like. I submit that data mining might be useful in identifying these individuals, given their much larger number. Whether or not this would work in practice could only be known by actually trying.
Last, the issues surrounding data mining's ability to tackle the problem of terrorism have frequently been dressed up in technical language by reference to the concepts of "false positives" and "false negatives", which I believe to be a straw-man argument. Solutions to classification problems frequently involve the assessment of probabilities, rather than simple "terrorist" / "non-terrorist" outputs. The output of data mining in this case should not be used as a replacement of the judicial branch, but as a guide: Estimated probabilities can be used to prioritize, rather than condemn, individuals under scrutiny.