Governments’ capacity to store transactional data and the content of communications poses a unique threat to journalism in the digital age. By Geoffrey King
In fall 2013, the U.S. National Security Agency quietly began booting up its Utah Data Center, a sprawling 1.5 million-square-foot facility designed to store and analyze the vast amounts of electronic data the spy agency gathers from around the globe. Consisting of four low-slung data halls and a constellation of supporting structures, the facility includes at least 100,000 square feet of the most advanced data reservoirs in the world. The project represents a massive expansion of the NSA’s capabilities and a profound threat to press freedom worldwide.
The data center is but the most obvious example of a future in which governments may not only collect and parse enormous quantities of data, but also store it for increasingly longer periods of time. The retention of surveillance data poses a unique threat to journalism in the digital age, particularly as technological advances allow the NSA and other intelligence agencies to store indefinitely not only the transactional details of all communications–as many experts believe is already the case–but also huge amounts of the content of phone calls, texts, and emails. By keeping a record of all communications transactions swept up in its dragnet, and then linking those transactions to content, the U.S. government could recreate a reporter’s research, retrace a source’s movements, and even retroactively listen in on communications that would otherwise have evaporated forever. It could soon be possible to uncover sources with such ease as to render meaningless any promise of confidentiality a journalist may attempt to provide–and if an interaction escapes scrutiny in the first instance, it could be reconstructed later.
Surveillance and persistent data storage have the potential to disrupt the free flow of information even in nations such as the U.S., which boasts strong protections for the freedom of the press. Journalists and sources alike will know that any story that draws official ire could be as likely to lead to exposure as to provoke public debate and reform. As long-term storage rapidly becomes less expensive, it will fall within the grasp of authoritarian regimes whose track records on press freedom afford little hope for restraint.
In addition to amplifying the harms caused by pervasive surveillance, the storage of data creates another, unique potential: it provides a deep breeding ground for artificial intelligence systems, which may in the future lead to more efficient, even predictive, spying machines. As these capabilities evolve, governments will be able to spot patterns of terrorist activity, or journalistic activity, long before either becomes a challenge to their power. If left unchecked, surveillance systems may fail to draw such distinctions.
The NSA is infamously opaque, and many claims about it are impossible to confirm. Even the two main buildings at the agency’s Fort Meade, Md., headquarters–named OPS2A and OPS2B–are literal black boxes of darkened, one-way glass. More significant than any physical obfuscation is the NSA’s reticence about its operations: it is a well-known joke among agency insiders that the NSA’s initials stand for “Never Say Anything” and “No Such Agency.” Given such secretiveness, reporting on the agency often becomes an exercise in careful conjecture. Leaked documents from former NSA contractor Edward Snowden have shed some light on the agency’s activities. For this report, CPJ interviewed veteran national security reporters, a lawyer challenging the constitutionality of the NSA’s surveillance, and William Binney, who was considered one of the best mathematicians and code breakers at the NSA during his 30 years with the agency.
The experts with whom CPJ spoke all said they believe that the NSA targets journalists for surveillance. They disagreed about whether certain journalists are under more threat than others.
Binney, who resigned from the NSA in 2001 in protest of the mass privacy violations he alleges the agency committed after the 9/11 attacks, believes that the government keeps tabs on all reporters. He told CPJ, “They have a record of all of them, so they can investigate, so they can look at who they’re calling–who are the potential sources that they’re involved in, what probable stories they’re working on, and things like that.” Journalists, Binney noted, are “a much easier, smaller target set” to spy on than the wider population, and in his view, the NSA most likely takes advantage of this.
In contrast, national security journalist James Bamford, whom The New Yorker dubbed “The NSA’s Chief Chronicler,” told CPJ that he believes certain journalists get extra scrutiny. “If you’re writing about national security or the NSA itself,” he said, “they consider you–a journalist–a national security danger, and so they feel justified in doing whatever they’re doing.”
Alex Abdo, an American Civil Liberties Union (ACLU) attorney, is part of a team of lawyers who have litigated against the NSA for violating the privacy and free speech rights enshrined in the U.S. Constitution. He told CPJ that he believes that “all reporters should be worried,” though perhaps for different reasons. “Reporters who work for the largest media organizations should be worried probably primarily because their sources will dry up as those sources recognize that there is no way to cover their trail” when they talk to journalists at The New York Times, The Washington Post, or The Wall Street Journal. For independent journalists, by contrast, the primary concern is that “they themselves will be swept up in the course of their reporting, because they don’t enjoy some of the institutional protections that journalists get when they work at the bigger organizations.”
Asked about surveillance of journalists, the NSA asserted that the primary function of its data collection is to protect the U.S. from foreign threats. Spokeswoman Vanee’ Vines, herself a former investigative journalist, told CPJ, “NSA is focused on discovering and developing intelligence about valid foreign intelligence targets in order to protect the nation and its interests from threats such as terrorism and the proliferation of weapons of mass destruction.” Vines also pointed to a statement on the NSA’s Tumblr site that states, “NSA conducts all of its activities in accordance with applicable laws, regulations, and policies–and assertions to the contrary do a grave disservice to the nation, its allies and partners, and the men and women who make up the National Security Agency.” (Intelligence officials have in the past misled the public about the NSA’s activities. At a March 2013 Senate Intelligence Committee hearing, Sen. Ron Wyden asked Director of National Intelligence James Clapper, “Does the NSA collect any type of data at all on millions or hundreds of millions of Americans?” Clapper said, “No sir . . . not wittingly.” After Snowden’s revelations about the mass collection of Americans’ phone call records, and facing accusations of perjury from members of Congress, Clapper sent a letter to the committee chairwoman, Sen. Dianne Feinstein, apologizing for his “clearly erroneous” remarks under oath.)
Russ Tice, who spent nearly 20 years working in various government agencies, claims to have firsthand knowledge of the targeting of journalists for surveillance. Speaking to Keith Olbermann in 2009, Tice alleged that, while an analyst at the NSA, he witnessed an agency program that gathered information on U.S. news organizations and journalists. He did not elaborate. And the NSA may not be the only U.S. intelligence agency monitoring journalists. In 2008, retired Army Sgt. Adrienne Kinne told Democracy Now!’s Amy Goodman and other journalists that, while in military intelligence, she listened to telephone conversations between journalists in Iraq and their spouses and editors, even though, as their identities became clear, their numbers could have been excluded from interception.
In addition to these allegations, in August 2013 the German magazine Der Spiegel reported that it had reviewed NSA documents, provided by Snowden, showing that the agency hacked into a “specially protected” internal communication system at the Qatar-based broadcaster Al-Jazeera. According to Der Spiegel, the NSA documents listed the operation as “a notable success.” The NSA has not publicly commented on the report.
One journalist for whom surveillance apparently has had direct and recent consequences is the award-winning documentary filmmaker Laura Poitras, whose films showcase American policy in the post-9/11 era and who, with Glenn Greenwald, documented Snowden’s revelations about the NSA in the Guardian. Poitras says she was detained for questioning at U.S. border crossings more than 40 times between 2006 and 2012; Snowden told Peter Maass for The New York Times Magazine that, because of her previous reporting, Poitras was “specifically becoming targeted by the very programs involved in the recent disclosures.”
Comments by the head of the NSA, Gen. Keith Alexander, in October 2013, suggested that the agency has little patience for journalists who dig into its activities. “I think it’s wrong that newspaper reporters have all these documents, 50,000 or whatever they have, and are selling them and giving them out as if these–you know it just doesn’t make sense,” he told the Defense Department’s “Armed With Science” blog, as reported by Politico. “We ought to come up with a way of stopping it. I don’t know how to do that, that’s more of the courts and the policy makers, but from my perspective it’s wrong, and to allow this to go on is wrong.”
Most journalists will probably not end up in the NSA’s crosshairs. But all journalists need to recognize that the agency is collecting immense amounts of information, that it will continue to develop this capacity, and that once collected, this information can be retained and put to broad use. Thus, while a small-town reporter who writes about the state fair may not be as likely to be surveilled as a big-city national security reporter who writes about the affairs of nation-states, both are vulnerable–especially when surveillance data is indexed and stored.
Behind the Utah Data Center’s battleship-gray walls sit the devices that make up what NSA Chief Information Officer Lonny Anderson has described as the NSA’s “cloud.” Although expert opinions regarding the facility’s storage capacity vary widely, even the most conservative estimates are astounding. On the low end, it is thought that the Utah Data Center can store between 3 and 12 exabytes of data. (An exabyte is the equivalent of a billion gigabytes.) To put this in perspective, in 2003 researchers at the University of California, Berkeley, estimated that the amount of information generated by all conversations since the dawn of humanity would total about 5 exabytes. Bolder theorists such as the former NSA analyst Binney say the Utah facility will have a gross storage capacity of about one zettabyte, or 1,024 exabytes.
Binney told CPJ that the NSA is mapping individuals’ lives, particularly their social and business connections, via the trail of digital “metadata” attendant with day-to-day existence. Though generally considered to exclude the contents of communications, and often transactional or descriptive in nature, metadata can be exquisitely detailed, as illustrated by a top secret order from the secretive U.S. Foreign Intelligence Surveillance Court (known as the FISA court) leaked to the Guardian by Snowden. According to the order, the NSA collects the numbers, location data, unique identifying information, and the time and duration of phone calls, for all parties. As reported by The New York Times, the FISA court has also authorized and re-authorized the collection and analysis of all Americans’ call records, regardless of any connection to a foreign agent.
Although it is impossible to know of everything the NSA is collecting, courts have ruled in other contexts that information often considered to be metadata is not limited to phone calls and can include banking, Internet, email, and other records. Though judicial attitudes toward the privacy implications of metadata surveillance may slowly be shifting, as judges have begun to recognize its power to open up the lives of individuals to scrutiny, at present such data remains largely unprotected by the U.S. Fourth Amendment–meaning that even American journalists lack a so-called “reasonable expectation of privacy” for large amounts of their information.
The information gleaned from the aggregation of metadata records can build a remarkably intimate picture of one’s life. As computer security expert Bruce Schneier wrote on his blog in September 2013, metadata analysis is the equivalent of hiring a private detective to keep tabs on a person’s activities and associations. “The result would be details of what he did: where he went, who he talked to, what he looked at, what he purchased–how he spent his day,” Schneier wrote. “That’s all metadata.”
Metadata surveillance is particularly dangerous to journalists because it means the government can quickly pinpoint their sources. Bamford said, “It’s always dangerous when the government has access to journalists’ communication because what journalists guarantee sources is confidentiality, and if there’s no such thing as confidentiality from the government, it would inhibit the future cooperation from sources.” This has a “very serious effect” on investigative journalism, he told CPJ. “If they’re able to see all the numbers you’re calling, they’re able to tell pretty much what kind of story you’re working on, even without getting the content of it. They’re able to tell what the nature of the story is, who the sources are you’re dealing with.”
Metadata has an exceptionally small digital footprint that belies its intrusiveness. According to Binney, “You could build a graph for phones and emails and banking and carry the aggregate metadata graph of those domains and keep all that information in the size of a room 12 foot by 20 foot. And do it for the world, and keep it indexed for as many years as you want.” (A January 2011 NSA memorandum obtained by The New York Times confirms the existence of such large-scale graph analysis.)
Given this technical reality, Binney said, it is clear that the NSA did not build its Utah facility for transactional data alone. When asked why the agency might need the kind of space it had constructed, Binney said: “It means content of communications, not just metadata. They are building more and more storage because they’re collecting more and more.” He said the NSA will “take everything” off communication lines “and store it” for perhaps half a million to a million targeted individuals. According to Binney, the content information will then be indexed to the graph of lives and social networks. The agency can then query a timeline of an individual’s relationships over a period of time and “go straight into the content” indexed to each event. Binney’s estimate is that the NSA has both content and metadata going back a dozen years, and that this will only grow over time.
The extent to which the NSA may lawfully gather, store, and disseminate the contents of communications about U.S. persons is more closely constrained by the Fourth Amendment, as well as by statutes such as the 2008 FISA Amendments Act and other regulations, than is metadata. Nonetheless, there are numerous ways for the NSA to harvest the contents of communications of American journalists. The New York Times reported in August 2013 that the NSA is copying and searching the contents of large amounts of Americans’ cross-border communications, for the purpose of uncovering even mentions of small details–an email address, for example, or a nickname–about a foreigner under surveillance.
Additionally, under current regulations, incidentally-acquired communications of Americans can be retained for up to six years to analyze whether they contain foreign intelligence information and/or evidence of a crime, according to recently declassified documents and reporting by The Washington Postand The Guardian. This is true even for communications that turn out to have been purely domestic in nature. (According to The New York Times, a 2010 internal briefing paper from the NSA Office of Legal Counsel indicated that the agency was allowed to collect and store raw communications traffic from U.S. citizens and residents, including both metadata and content, for up to five years online and for 10 years for “historical searches.” Encrypted communications may be kept indefinitely, documents leaked to the Guardian reveal.) According to a report by the Brennan Center for Justice at NYU School of Law, the NSA can also share reports based on Americans’ incidentally-acquired foreign communications–and under certain circumstances, the “unminimized communications” themselves–with the Central Intelligence Agency, the Federal Bureau of Investigation, and even foreign governments. Additionally, information about Americans’ incidentally-acquired domestic communications may be shared with the FBI.
Though the amount of content stored subsequent to such collection cannot be established with certainty, some information came to light in the wake of the April 2013 Boston Marathon bombings that could prove instructive regarding the scope of content-based surveillance of U.S. citizens and residents. In an interview with CNN, former FBI counterterrorism agent Tim Clemente implied that even domestic telephone calls are being recorded in bulk and can be reproduced as needed. “All of that stuff is being captured as we speak whether we know it or like it or not,” he said, noting later that “there’s a way to look at digital communications in the past” and that “no digital communication is secure.”
Internet Archive founder Brewster Kahle, a proponent of one of the more cautious estimates about the Utah Data Center’s storage capacity, posted a spreadsheet in June 2013 estimating that if the NSA did record and store all U.S. phone calls, both foreign and domestic, it would cost only $27 million per year to do so.
As the government stores more and more data, it will become next to impossible for journalists to keep sources confidential. “The problem is, the more data you get, the more capacity you have to see into somebody’s life,” Binney said. “And it gets a much finer grain of picture of your electronic life. So capturing that and being able to collect all that data and correlate it then makes that picture of you much clearer. And that’s only getting better with storage.”
These advances illustrate why the Utah Data Center’s present capabilities are far from the end of the story for journalists either in the U.S. or abroad. As the NSA’s director for Installations and Logistics, Harvey Davis, put it to The Salt Lake Tribune, “I always build everything expandable.” And in addition to the Utah facility, the NSA stores data at facilities in Hawaii, Colorado, Texas, Georgia, and Maryland. It is also possible that the agency has developed secret custom hardware, proprietary data compression algorithms, or other efficiency-enhancing techniques that would expand the amount of raw data that can be saved.
The dangers are further compounded for non-U.S. journalists. “Anyone who’s not an American citizen or not somebody within the United States–there are no protections at all,” Bamford said. If a British, French, or German journalist were to undertake “an investigative story on something involving the U.S., some war crime committed by somebody in the U.S.,” the NSA, he said, “can do whatever they want in terms of finding out who their sources are.”
The ACLU’s Abdo agrees. “Even a mainstream reporter abroad has a different type of worry than a mainstream reporter in the United States,” he said. “I would be surprised, for example, if the U.K. office of the Guardian were not the subject of significant NSA surveillance.” Abdo’s comments follow the revelation from Guardian Editor Alan Rusbridger in August that security agents from Government Communications Headquarters–Britain’s version of the NSA–oversaw the destruction of computer hard drives at the Guardian in a bid to prevent the newspaper from reporting further on Snowden’s documents. (The Guardian rendered GCHQ’s efforts futile by forging a partnership with The New York Times and the nonprofit news group ProPublica, which as American organizations enjoy significant legal protections from prior restraint under the U.S. Constitution.)
Citing the Constitution, veteran reporter Peter Maass expressed defiance. “Does it worry me to know that the government can store stuff and recreate stuff, whereas in the past, it would need to have specific court orders in order to listen to and store my phone calls? Yeah,” he told CPJ. “This is the reason why we’re all writing these stories–and that we find problems in what the government is doing.” Ultimately, Maass sees the NSA’s activities as an opportunity for educating the public. “The more the government does this, the more they are creating a problem for themselves, and journalists like myself are going to go at it 110 percent, because they are such core constitutional challenges,” he said. “Part of me says bring it on.”
Maass recognizes, however, that just as the U.S. takes the gloves off when dealing with foreign journalists, other actors are likely to handle Americans the same way. “The NSA and the U.S. government are not the only threat” to the work of American journalists, he notes. “The Russian government is interested in it, and the British government is interested in it, private interests are interested in it,” he said. “So we have to be aware of that.”
Advances in technology could soon allow nearly any government to engage in unprecedented levels of surveillance and storage. According to a 2011 report by the Brookings Institution, data storage costs have declined by a factor of 10 roughly every four years over the past three decades. In 1984, a gigabyte of storage cost $85,000 in 2011 dollars; in 2011, a gigabyte cost 5 cents, according to the study. Based on these numbers, in 2011 it would have cost Syria–which in 2012-13 was the fourth-highest exporter of journalists fleeing for their lives –only about $2.5 million to record all calls made by its citizens. By 2016 that number could drop to $250,000, and by 2020, to $25,000.
While the experts debate the fine points, working journalists are forced to examine their own practices. Ali Winston, an award-winning freelance investigative reporter based in the San Francisco Bay Area, told CPJ that the dual threat of pervasive surveillance and data storage “has made me rethink my own privacy. It has made me conscious of how I treat my sources, and it has made me conscious that I don’t want things to fall back on my sources.”
Winston, who graduated from the UC Berkeley Graduate School of Journalism in 2010, said he has tried to mitigate the exploitation of his electronic communications for years, including by using the anonymizing software Tor, and has taken new security steps given recent revelations about surveillance. He cites as a turning point the 2005 revelation by James Risen and Eric Lichtblau in The New York Times of the NSA’s initial warrantless wiretapping program. “I began to educate myself by reading James Bamford’s books, by reading the newspaper, by reading the back stories,” of surveillance initiatives, Winston told CPJ. “There is an innate logic in surveillance systems towards expansions,” he said. “There are no natural checks on surveillance. They will continue to gather information until a block is put in front of it.”
As dangerous as the NSA’s expanding storage capabilities are to journalism, the trend carries an even darker prospect. The growth of data collection and storage provides a training ground for artificial intelligence systems designed to fish information efficiently from a vast sea of data. “It’s basically a gold mine for those kind of processes,” Binney told CPJ of the databases. “They need an automated algorithm to go through and figure out what is important.” According to Binney, the ultimate goal is to be predictive. “They want to get to the point where they can be doing intentions and capabilities of potential threats,” he said.
Bamford wrote about earlier data initiatives in his 2008 book, The Shadow Factory. These initiatives included a 2004 pilot project that used information taken from news articles to build a computerized brain capable of predicting events. As Bamford wrote, “Once up and running, the database of old newspapers could quickly be expanded to include an inland sea of personal information scooped up by the agency’s warrantless data suction hoses. … Unregulated, they could ask it to determine which Americans might likely pose a security risk–or have sympathies toward a particular cause.” As Bamford reported, the project, which was still going at least as late as 2009, was sufficiently troublesome that an unnamed researcher resigned over moral concerns.
If the NSA manages to develop a system that could automatically assign a threat index to members of the public, the agency would almost certainly use it to give journalists extra attention. Even journalists who do not find themselves under scrutiny for their work are at risk. As Cynthia Wong of Human Rights Watch noted in an analysis posted on the organization’s website in August 2013, reporters are among the relatively few regular users of privacy-enhancing technologies. This alone, Bamford told CPJ, is enough for the government to target reporters. “I don’t use encryption,” he said. “No. 1, it flags you, and No. 2, it gives [the NSA] more of an incentive to try and break it.” (At least one expert disagrees: Snowden told The New York Times Magazine that “unencrypted journalist-source communication is unforgivably reckless.”)
By automating processes, the NSA is lowering the opportunity costs of surveillance. As profound as the possibilities may be, in the near term automation makes such processes both more intelligent and less. “You’ve got the NSA collecting everybody’s phone records, so whenever you pick up the phone, there’s a record with NSA,” Bamford said. “You have machines that are making these connections, and they may have no rational basis.” An individual who reaches out to a controversial source for information may end up paying for it later, he said. “All you’re seeing is that there’s a link between a target and a U.S. citizen, and now that U.S. citizen becomes a suspect.”
Binney agrees. “Just because you call the pizza guy and I call the same pizza guy doesn’t mean we have a relationship,” he said. “So there’s no reason to collapse us into that community, based on that one call to the pizza guy.”
Said Bamford, “The NSA is gathering power and they’re gathering more capabilities and more eavesdropping, more invasive technologies.” He added, “At the same time, they’re deceiving the very weak organizations that are supposed to be the oversight mechanisms–the Congress and the FISA Court. I think it’s a very worrying situation, not just for journalists, but for anybody.”
The revelations about surveillance have changed the way journalists must think about the security of their work product, their sources, and themselves. Prudent journalists wishing to avoid scrutiny for themselves or their sources will have to adapt their behavior, whether by avoiding contact with sources or ceasing to use privacy-protective technologies such as encryption. Such changes impair journalists’ ability to freely gather and disseminate information.
Regardless of whether the NSA’s programs are as carefully targeted as it claims, the agency’s infamous secrecy and expansive capabilities have cast a deep shadow on press freedom worldwide. When even sophisticated digital self-help is merely an imperfect solution, the only true recourse is to force transparency through ever more incisive reporting, for as Supreme Court Justice Louis Brandeis wrote 100 years ago, “Sunlight is said to be the best of disinfectants.”
CPJ Internet Advocacy Coordinator Geoffrey King works to protect the digital rights of journalists worldwide. A constitutional lawyer by training, King, who is based in San Francisco, also teaches courses at UC Berkeley on digital privacy law and on the intersection of media and social change.