Data Journalism and the Law | School of Journalism
Legal issues in data journalism

Data Journalism and the Law

The challenge for the next generation of journalists will be to adapt data and computational science to reporting and storytelling while upholding the profession’s core journalistic mission. To that end, this data journalism series focuses on the themes covered in the M.S. in Data Journalism program, which aims to provide future journalists with an understanding that goes well beyond data journalism fundamentals and offers an advanced graduate-level curriculum that includes data, computation and innovation classes.

As data has grown as a driver of reporting, so too have legal concerns regarding the accuracy of information, acceptable methods of information gathering and what is considered proprietary information. This report from the Columbia Journalism Review examines these emerging concerns as well as shifts and gray areas in the law.


In 1961, legal scholar Alexander Meiklejohn famously wrote that the rationale for the First Amendment depended on citizens’ ability to receive and use information relevant to democratic self-governance. (1) The crux of his statement was this: knowledge is power. Fifteen years later, scholar Thomas Emerson would rely on Meiklejohn’s work to famously highlight the “vital importance in a democratic society of the right to know.”(2) In his article, he explained how James Madison, the author of the First Amendment, asserted that “[a] popular government, without popular information or the means of acquiring it, is but a prologue to a farce or a tragedy; or perhaps both.”(3) From there, Emerson continued, “A people who mean to be their own governors, must arm themselves with the power that knowledge gives.”(4)

In view of this, asserting access to information seems paramount to self-governance. Every day reporters try to fulfill this duty through various mechanisms. But in our current environment, they are often competing with an unparalleled glut of information that readers absorb from the moment they wake up to the moment they power down their devices at night. One study by Northeastern University estimated that the size of the “digital universe” of data was 4.4 zettabytes in 2013—and is scheduled to jump to 44 zettabytes by 2020.(5iii) According to a Forbes magazine piece in 2015, “More data ha[d] been created in the past two years than in the entire previous history of the human race.”(6iii)

This voluminous amount of information has led to massive shifts in the news industry for nearly a decade. Since about 2008, the explosion of data journalism—defined as journalism that heightens the role numerical information plays in storytelling—is now a driving force in newsrooms around the country. Journalists are quickly learning how to obtain troves of data through electronic leaks, drones, and cutting-edge computer programs that sometimes require little more than the click of a button to access information. In other instances, journalists confronted with processing large swaths of information must employ complicated algorithmic and programming skills. Many larger news organizations have even built internal digital programs and tools to sort through these data swells and leaks—as was done with the Panama Papers.(iv)

As the Global Investigative Journalism Network reported in 2015, “After nearly 50 years of journalists using data, it is clear that data is not only a routine part of journalism, but also a driving force for stories.”(7) A recent report by Google stated that 42 percent of reporters use data to tell stories regularly, and 51 percent of all news organizations in the United States and Europe now have a dedicated data journalist—a figure that rises to 60 percent for digital-only publications.(8)

While data has become integral to reporting the news, the quantity of data at large and the celerity with which it can spread have led to many journalistic concerns over protecting sources, the accuracy of published information, the inability to provide meaningful redactions, and journalistic liability. Meanwhile, corporations and governments, which hold much of the information that journalists are responsible for reporting on, are beginning to exercise stricter controls over their data—in many cases by asserting that it is proprietary information. Federal and local governments are also guarding their information by expanding exemptions under the Freedom of Information Act, and increasingly asserting privacy exemptions on behalf of individuals and corporations alike. Similarly, private companies are exerting stronger trade secret exemptions, with governments upholding those claims in even the most dubious circumstances.

While several reports have covered in depth what data journalism is and how to implement it at various institutions, no recent studies have discussed how this change in storytelling is subject to the legal landscape journalists must work within. Traditionally, media law concerns around newsgathering have been limited to questions about trespass, recording laws, access to illegally obtained material, and the potential for prosecution under the Espionage Act. These leading doctrines surely still apply, but there are shifts in case law and arguments that have yet to be fully explored in media law casebooks and conversations. This report is an attempt to tease out some of those new conversations and explore how various case law is being affected by our data addiction.Executive Summary

No comprehensive study before this one has examined how the changes in reported storytelling may create new legal considerations for journalists. This report aims to help journalists, lawyers, and academics understand the shifts taking place in media law as a result of both the growing volume of data in our information economy and the seismic shifts occurring within journalism and technology. By examining developments in newsgathering law, the Freedom of Information Act, and laws involved in leak investigations, this research underscores worrisome shifts in those laws, as well as gray areas where reform would strengthen the rights of a free press and of journalists.

In its first part, the report looks at emerging concerns over data journalism projects that could trigger the Computer Fraud and Abuse Act (CFAA)(i) by employing scraping, a data collection technique that usually relies on automation—through bots, crawlers, or applications—to extract data from a website. As data collection becomes increasingly important for investigative journalists in particular, legal experts worry about civil and criminal penalties that exist under the statute—which has been described by some First Amendment advocates as unconstitutionally vague. In reviewing the history and case law of the CFAA in relationship to journalism, the research offers practical tips and various legal considerations on the issue.

Next, the report discusses troubling trends arising under the Freedom of Information Act (FOIA) in the digital age, as the amount of government information held in databases and government logs grows, and the need for transparency is crucial. Lastly, it reviews data’s impact on laws affecting whistleblowers. In the past decade, we’ve seen more leak prosecutions in the United States than all those combined in the country’s history. This, of course, occurs at a time when there is more information than ever before for whistleblowers to share.

Key findings

  • No journalists to date have been sued or prosecuted under the Computer Fraud and Abuse Act, but there’s evidence that stories have been hindered or held from publication for the threat of penalty. Under the statute, a person may be penalized for “unauthorized access” to data on a company’s website by scraping data through bots, crawlers, or applications. While journalists have developed techniques and tools to sidestep potential liability, including piecemeal data extraction that goes unnoticed or crowdsourcing the public’s help, the CFAA presents real obstacles to reporting a variety of important stories in the public interest.

  • In the past decade, as the volume of US government-controlled data has increased, government agencies have experienced a swell in the number of FOIA requests. At the same time, reporters often see delays in processing requests, insufficient searches conducted by the agencies, government data equated with proprietary information leading to denial of access, and developing case law that prohibits access to government databases. In a number of recent instances, courts have upheld determinations by agencies that searching a government database amounts to producing a new document—which is prohibited under FOIA.

  • While no journalist has been convicted under the Espionage Act, the statute includes provisions that could potentially be levied against journalists. Even more stifling than the policies and laws used to intimidate and silence whistleblowers in the digital age, though, is the degree to which government authorities seem preoccupied with journalists specifically. Now, possessing information obtained from confidential sources—a basic tenet of First Amendment doctrine—is potentially a prosecutable offense.

  • In many ways, journalists’ access to critical information is being restricted, either by the passive or explicit threat of criminal penalties, de-prioritization in favor of corporate secrecy, or an inadequate legal understanding of technological advances.

While none of these shifts are totalizing—or irreversible—together they indicate a new direction and acceleration in our information economy that may have consequences for journalists. As more information is created, there is growing need for reporters to discern the importance of voluminous data dumps—and to uncover stories hidden in their details. Unlike anti-secrecy sites such as WikiLeaks, journalists review, analyze, and edit information to help citizens navigate the evolving information landscape. But the current state of journalism, which often sees the press intimidated by public officials and public figures, is presented with unique challenges in an oversaturated information economy where there are fewer resources and protections in place for journalists to discern the truth. It is time that we consider these subtle shifts as a hazard to the Fourth Estate itself.

Continue reading at Columbia Journalism Review »


Learn more about the School's Data Journalism Webinars hosted by award-winning faculty and experts.