Wednesday, December 27, 2006

Data mining and Terrorism

Just came across an article by Jim Harper entitled Data Mining Can't Improve Our Security that argues persuasively against the use of data mining in such matters. He writes

Data mining for terrorism prediction has two fundamental flaws:

— First, terrorist acts and their precursors are too rare in our society for there to be patterns to find. There simply is no nugget of information to mine.

— Second, the lack of suitable patterns means that any algorithm used to turn up supposedly suspicious behavior or suspicious people will yield so many false positives as to make it useless. A list of potential terror suspects generated from pattern analysis would not be sufficiently targeted to justify investigating people on the list.

and concludes

Unfortunately, there is no magic bullet that solves the security conundrums created by terrorism. Data mining is a useful technique in many areas, but not this one.

I must say that in general I agree, and have argued the same privately to course attendees who were interested in this type of analysis. [A disclamor: I have never worked with intellgences data, so these opinions are not based on direct experience with terrorist-related data.] However, I also believe that while the data is particularly difficult, it is not useless, and therefore disagree with the final conclusion. The question I think should be asked is this: can data mining improve the ability of law enforcement to identify suspected terrorists. Now it may be that any improvement in information provided by data mining may not worth the effort (as he argues)--this I don't know. But if it can improve the odds of finding a dangerous individual 10-fold or 100-fold over what is currently done, then is it not helpful? (I won't comment on the privacy issues here--that is another important issue, but unrelated to his premise that data mining is not useful here).

For those who have worked on difficult problems, the exploratory data analysis that takes place during the data mining project nearly always yields useful information even if the no final models are produced. Therefore, the conclusion to not do any data mining at all seems to me to be an overreaction.


Unknown said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.