18 October, 2006

data-mining "sentiment analysis" & academia

Chris Floyd has the lowdown on the government's latest data-mining venture:

Why is the United States government spending millions of dollars to track down critics of George W. Bush in the press? And why have major American universities agreed to put this technology of tyranny into the state's hands?

. . . a modest $2.4 million Department of Homeland Security grant to develop "sentiment analysis" software that will allow the government's "security organs" to sift millions of articles for "negative opinions of the United States or its leaders in newspapers and other publications overseas," as the New York Times reported earlier this month. Such negative opinions must be caught and catalogued because they could pose "potential threats to the nation," security apparatchiks told the Times.

This hydra-headed snooping program is based on "information extraction," which, as a chipper PR piece from Cornell tells us, is a process by which "computers scan text to find meaning in natural language," rather than the rigid literalism ordinarily demanded by silicon cogitators. Under the gentle tutelage of Homeland Security, the universities "will use machine-learning algorithms to give computers examples of text expressing both fact and opinion and teach them to tell the difference," says the Cornell blurb.
. . .
For those with concerns about civil liberties, Cornell assures us that SAP will be limited strictly to foreign publications. Oh, really? Hands up out there, everyone who believes that this technology will not be used to ferret out "potential threats to the nation" arising in the Homeland press as well. After all, the Unitary Executive Decider-in-Chief has already decided that the nation's iron-clad laws against warrantless surveillance of American citizens can be swept aside by his "inherent powers" if he decides it's necessary. Why should he bother with any petty restrictions on a press-monitoring program? And wouldn't dissension within the ranks of the volk itself actually be more threatening to government policy than the grumbling of malcontents overseas?

. . . we must ask: who is the "Sentiment Analysis" program aimed at? It can't be the major news and opinion drivers in the international and national media; these are already being monitored. And it hardly requires a deus ex machina to determine the political sentiment behind news stories and opinion pieces. Why then would you need multimillion-dollar computer whizbangery to tell you whether a story casts a favorable or critical light on Bush and his policies? And how could critical "sentiment" in the kinds of stories that Cornell, Pitt and Utah are examining in their tests pose any kind of "potential threat" to the nation? Again, there must be something else behind the program because, as with warrantless surveillance, it is clearly redundant on its face.

The key to this conundrum mostly likely lies in the envisioned scope of the program: "millions of articles" to be processed for "sentiment analysis." This denotes a fishing expedition that goes far beyond the "publicly available material, primarily news reports and editorials from English-language newspapers worldwide" that Claire Cardie, Cornell's lead researcher on SAP, says that her team will be using in developing the software. The target of such a scope cannot be simply the English-language foreign press, or the foreign press as a whole, or indeed, every newspaper in the world, from Pyongyang to Peoria. It must also be aimed at other modes of textual communication, in print and online.
. . .
It is part of a larger Homeland Security push "to conduct research on advanced methods for information analysis and to develop computational technologies that contribute to securing the homeland," as a DHS press release puts it, in announcing the formation of yet another university consortium. This group - led by Rutgers, and including the University of Southern California, the University of Illinois and, once again, Pitt - has pulled down a whopping $10.2 million to "identify common patterns from numerous sources of information" that "may be indicative of" - what else? - "potential threats to the nation."

This research program will draw on such areas as "knowledge representation, uncertainty quantification, high-performance computing architectures" - and our old friends, information extraction and natural language processing. It is in fact closely associated with the "sentiment analysis" work being done by the Cornell group - and note that the Rutgers consortium is designing its info-gobbling software to deal with "numerous sources" of information. Do we sense some synergy going on here?

The Cornell and Rutgers groups are two of four "University Affiliate Centers" thus far established by Homeland Security. All of the consortiums are geared toward the amassing, storing and analysis of unimaginably vast amounts of information, gathered relentlessly from a multitude of sources and formats. They are in turn just part of a still-larger panorama of "data mining" programs being developed - or already in use - by the security organs.

These include the "Analysis, Dissemination, Visualization, Insight and Semantic Enhancement" (ADVISE) program, which can rip and read mountains of open source data - such as web sites and databases, as analyst Michael Hampton reports. Two Democratic Congressmen, David Obey of Wisconsin and Martin Slabo of Minnesota, have asked the General Accounting Office to investigate the program for possible intrusions on privacy rights, Hampton notes.

0 Comments:

Post a Comment

<< Home