Learn more about the School's Data Journalism Webinars hosted by award-winning faculty and experts.
How to report on algorithms even if you’re not a data whiz
The challenge for the next generation of journalists will be to adapt data and computational science to reporting while upholding the profession's core journalistic mission. To that end, this data journalism series aims to share original features that align with themes covered in the M.S. in Data Journalism program, which aims to provide future journalists with an understanding that goes well beyond data journalism fundamentals and offers an advanced graduate-level curriculum that includes data, computation and innovation classes.
This article was originally published in the Columbia Journalism Review .
There’s a new beat in town: algorithms. From formulas that determine what you see on social media to equations that dictate government operations, algorithms are increasingly powerful and pervasive. As an important new field of influence, algorithms are ripe for journalistic investigation.
But investigating computer code can come across as dry and technical. Researchers often talk about “auditing” and “reverse engineering” algorithms—activities requiring heavy data analysis. But algorithmic accountability reporting projects don’t have to be this way. There are many possible approaches that draw on traditional reporting as well.
The complexity of this beat often requires a solid familiarity with how algorithms work, practice looking under the hood, and careful consideration of the right (sometimes technical) questions to ask experts. ProPublica is the flag-bearer of algorithmic accountability reporting. In May 2016, it published the Machine Bias project—a report on how crime prediction and risk assessment software is prejudiced against black people. Our own research at the University of Maryland Computational Journalism Lab has looked at Uber service quality and disparities across neighborhoods in Washington, DC. More recently, we’ve studied a range of media issues raised by Google search algorithms privileging certain content in the 2016 election.
But these examples are just one possible path. There are a range of opportunities for journalists to investigate algorithms depending on their degree of technical savviness, the focus of their story, and their reporting method.
In order to reduce the barrier of entry into reporting on algorithms, we created algorithmtips.org, a database of government algorithms that currently provides more than 150 ledes, and methodological resources for getting started. Maybe not for the ProPublicas of the world, but it is a gateway for a wider set of journalists to get into algorithmic accountability reporting.
Depending on the algorithm, there are (at least) three approaches that stories about algorithms can take:
Critiquing what they do and how
From formulas to determine medical care to rankings that determine where food inspections are taking place, there is wide use of algorithms by the government.
Some algorithms are interesting simply due to their novelty, like Gang Graffiti Automatic Recognition and Interpretation (GARI), an application which helps law enforcement and gang task force officers identify gang graffiti or tattoo images.
But others might be controversial or have undesired effects. The Health Resources & Services Administration uses an algorithm to determine which locations are listed as Health Professional Shortage Areas (HPSAs). It doesn’t take a lot of technical skill to brainstorm potential drawbacks: What if the score misses a mark and benefits some areas over others? Can certain populations be harmed because some important criterion was not taken into consideration? Is the algorithm fair?
Additional coverage could educate the public on drawbacks to using automated decision-making for particular government functions. A critical approach may not ultimately carry the evidentiary weight of a data-heavy audit, but it can draw attention to algorithms that demand a closer look.
Looking at who owns the algorithm
Government doesn’t always write algorithms itself; often it licenses the code from outside contractors. This creates the need for good old-fashioned government accountability reporting: Who are the companies that are contracted by the government? Is the selection process transparent? Do some companies have an unfair advantage in this market?
Third-party algorithms also raise transparency issues. In past work we’ve tried to FOIA criminal justice algorithms from different states and have come up mostly empty-handed. When government agencies contract algorithms from a third-party, they outsource a chunk of the decision-making process. That body of work, by definition, belongs to a private company. That means information about those algorithms may fall under Exemption 4 of FOIA requests, which is protects trade secrets.
As the government contracts more and more of its bureaucracy, the public is subject to two black boxes: the first, the algorithm itself; the second, the secrecy of the private companies. Part of the work of reporting on algorithms is to bring that encroachment to light.
Explaining how big they are
Automation promises the ability to do more, faster. As algorithms multiply, their scope widens, and they become more powerful. Take the case of Medicare’s Fraud Prevention System, which detects abnormal payments and helped recover $820 million dollars in the first three years of operation. Or the Automated Underreporter, used by the Internal Revenue Service to inspect 3.5 million tax returns in 2016, in search of unreported income. Because such systems can affect so many people, a single error in code could create widespread harm.
When it comes to algorithms, volume is key, but maybe the public’s perspective about that scale is lacking. How big is the loss if there’s a mistake in those formulas? What does an individual or a community have to lose if something goes awry? Reporting on algorithms needs to hold them accountable, and make the scale and scope of algorithms more visible, and meaningful, to the public.