Nonprofit news website The Markup has argued that web scraping is a vital tool for investigative journalists. (The MarkUp/Sam Morris)

The Markup’s Nabiha Syed on how the Supreme Court could protect data journalism

At first glance, the connection between data journalism and a Georgia police officer accused of accessing a government database for an improper purpose might seem tenuous. However, journalists and legal experts have highlighted the press freedom implications of a pending Supreme Court decision in the case of the officer, Nathan Van Buren, who is appealing his conviction under the 1986 Computer Fraud and Abuse Act (CFAA) for accepting bribes to access a confidential law enforcement database he used at work.  

The CFAA has been a deterrent to reporters and researchers engaged in online data collection, or web scraping, according to Nabiha Syed, president and legal counsel of The Markup, a nonprofit news organization that filed an amicus brief in the Van Buren case. The mechanics of web scraping are fairly straightforward: with a few strokes of coding, journalists can summon data for stories like The Atlantic’s COVID tracking project or The Markup’s reporting on disinformation on Facebook.  

In its decision, the Supreme Court will have the opportunity to limit the scope of the CFAA, and protect data journalists and researchers who are currently vulnerable to prosecution under the law for exceeding “authorized access” to computer systems, Syed said. She told CPJ in a video call what she’s looking for in the decision, which is expected this spring, and what she hopes might change afterward. The interview has been edited for length and clarity.

Can you break down why reporters should really care about this case? 

The Van Buren case is the first time the Supreme Court has interpreted the Computer Fraud and Abuse Act, which is a federal statute that imposes civil and, terrifyingly, criminal liability for the unauthorized access of computers.

The question is this: Does this scary statute apply when an individual is authorized to obtain information from a computer for some purposes, and not others? If you use a computer [or] a database for a reason that your employer tells you is okay, that seems fine. If you decide to look up something for yourself – is that now a crime?

A lot of concern about the Computer Fraud and Abuse Act is the scope of its application. [Data journalists] go on to websites that are governed by terms of service [that] delineate the rules of engagement for that website. [They] can say, “Yeah, sure you’re allowed to read this page, you’re allowed to purchase things from this page, you’re allowed to plan a coup, or have a social network on this page, but God forbid you scrape something. If you scrape, that’s going to run afoul of the terms of service.”

If the terms of service [serves as] the architecture of access, [and] you violate that access, all of a sudden you have potential civil and criminal liability. That is placing a lot of power in the hands of exactly the institutions that might not want you to be investigating them.

Why is scraping important to journalists? 

What’s fascinating about scraping is that it lets you analyze data. Think about the time it would take to parse through 10,000 state-based COVID-19 webpages…[Scraping] allows you to automate the collection of that data at scale, so you can move your investigative journalism from anecdotes to data in a way that gives you a bigger picture of how a whole system actually works.

It lets you assess things [such as] conditions in Chicago prisons for COVID-19 tracking. The reporters at The Markup [have looked] at discriminatory pricing at Amazon, [and] Facebook disinformation to assess how systems at scale are behaving in ways that the public doesn’t necessarily understand, which you can’t get to through anecdotal work. 

How would it affect reporters working in the United States if the case was decided against Van Buren? 

Data journalists will often say, “We’re going to build tools to scrape all of this [publicly available] information and then we’re going to analyze it for the purpose of our investigative journalism.”

But a number of reporters who don’t have legal resources opt out of that kind of analysis because of the fear of the ambiguity of this law. At any time, a powerful website that doesn’t like the conclusions you’re drawing from this type of newsgathering could come after you and jail [could be] on the table.

Civil liability is on the table in a way that many journalists without legal infrastructure or really robust resources can’t afford. If this goes the wrong way, I think you’ll see that kind of chilling effect, and [people] opting out of this type of data journalism. This is a time when we should all be interrogating these large systems that make decisions about our lives, not backing away from the challenge.

If the court decides in favor of Van Buren and strikes down the lower court’s decision, does this mean that scraping is sanctified? Will it take away some of the fear that journalists might have when engaging in this activity? 

It’s going to be really interesting to see what the court does when it comes to the contours of the word “access” and how access is decided. I think that’s the part to watch in this decision.

In the oral argument for Van Buren, you already saw the justices extremely skeptical of precisely the overbroad application of the law that has chilled so many journalists. I think the likeliest outcome – if I were a betting woman – is that you’ll see the justices provide language to constrain the broad interpretation of the CFAA.

If we see that constraining language, I think it is going to provide a lot of clarity for data journalists, so they don’t have to worry about the boogey man of this very broad law from the 1980s that was not designed to touch on journalism at all. That kind of clarity is going to be immensely helpful. 

What other legal protections would need to be put in place looking beyond this case to give data journalists more comfort when using scraping as a news gathering technique? 

I think for any cutting-edge newsgathering tool, there’s always a sense of, “Can we do this?” If we look historically to the use of hidden cameras or recording devices, there [was] always a bit of, “Is this considered a normal newsgathering activity?”

What is going to be helpful in terms of shoring up journalists is actually normalizing this practice [of web-scraping] as a legitimate and commonplace form of news gathering. And we see that happening, not only at The Markup [but in] data journalism courses at places like Columbia [Journalism School].

The next step is going to be talking about [it]. Because everyone is scared of the specter of this law, people don’t say, “This is how I collected it all.” Removing a bit of the chilling effect [so that] everyone can articulate that this is, in fact, routine – that’s going to embolden others.

The CFAA was not intended to cover reporters. The most we can hope for is that the justices make that clear, and say: “Reporters, this isn’t about you.”

Editor’s note: The first name of the petitioner in the Supreme Court case Van Buren v. United States has been corrected in the first paragraph.