It’s been more than a year since The New York Times declared this The Age of Big Data, but for most Americans, the news really hit home on election night, 2012.
Nate Silver’s uncannily accurate predictions about how the presidential race would turn out made him one of the most talked about people of the campaign, even in media circles, where the journalistic merit of Silver’s statistically driven work was vigorously debated.
Yet Silver’s work is arguably less revolution than evolution, one facet of a journalistic practice that has actually been around for decades, even if, like Silver, it only recently made it into the mainstream.
“We started out with this a long time ago—before the Web, before even reasonably simple computers,” says Sarah Cohen, editor of the computer-assisted reporting (CAR) team at The New York Times. As early as the late 1960s, journalists like Philip Meyer and Elliott Jaspin were using social science methods and data analysis—sometimes with the help of mainframe computers—to generate and test their journalistic hypotheses. “That was how a generation of us learned what [computer-assisted reporting] was,” says Cohen.
CAR is a practice that, while producing powerful results (see the Pulitzers of Jaspin, Meyer, Dedman, and others) for many years existed only at the margins of most newsrooms, the domain of a few motivated reporters. For much of that time, the methods of CAR hewed closely to those described in Meyer’s seminal book, Precision Journalism, and the tools remained fairly constant: spreadsheets, database software, and, eventually online resources. Likewise, the end product was the same as for any other news story: a printed text article.
In recent years, however, a slew of new terms have filtered into journalists’ vocabularies and job titles, like data journalism, computational journalism, news apps, and data visualization. To the uninitiated, what these descriptors mean—much less how they differ—may seem inscrutable. Yet even to insiders, their intersections and boundaries are often hard to resolve, and somewhere behind the semantics hovers a difficult question: Are these just new methods for executing the old jobs of journalism, or are they a fundamentally new philosophy of what journalism can be?
“In terms of terminology, I think it can be both misleading and enlightening,” says Troy Thibodeaux, the editor for newsroom innovation at the Associated Press. “It’s a very strange thing, because we’re all doing very closely related work.”
But what is it?
Perhaps the first step in discussing these practices is to distinguish between process and product. News apps and data visualization generally describe a class of publishing formats, usually a combination of graphics (interactive or otherwise) and reader-accessible databases. Because these end products are typically driven by relatively substantial data sets, their development often shares processes with CAR, data journalism, and computational journalism. In theory, at least, the latter group is format agnostic, more concerned with the mechanisms of reporting than the form of the output.
“CAR reporters are good at getting records,” says Reg Chua, data editor at Thomson Reuters. “A lot of CAR is data journalism; it’s interrogating data. Computational journalism represents a new step in what you can do—use of computers, and the processing power of computers and programming, to do types of reporting that were unimaginable even a few years ago.”
Harnessing that computational power, however, has meant bringing new practitioners into the field, and their ideas come from outside the typical CAR tradition.
“Now there’s this whole other path of people who were developers who have a very different perspectives,” says Thibodeaux.
At this year’s National Institute of Computer-Assisted Reporting (NICAR) conference in early March, Thibodeaux created and moderated a panel called “From CAR to newsapps and back again,” composed of two-person teams that have collaborated to produce some of the most influential work in digital journalism.
While on the whole the tone of the panel was mutually complimentary, Sarah Cohen conceded that some journalists still tend to trivialize the visual aspects of journalism.
“There are still some editors, though they are fewer and fewer, who really just think of graphics and interactive as just the candy,” rather than a legitimate news format, she says.
For their part, however, the developers present seemed to welcome a move towards more story-driven news apps and visualizations.
“News apps are now edited, which is fairly new,” said Derek Willis, interactive developer at The New York Times. “I think we now hope to treat the editing process as seriously as you do with any story, including asking, ‘Does this work? Does it deserve to stay up?’ I think this has been the growth in the domain.”
Yet while many news app developers will agree that news apps need story, they also assert that journalism needs news apps, which Thibodeaux says do “the thing that a story can’t do, which is let you drill down.”
Rather than focusing only on individual, moment-in-time accounts, Chua says, journalistic publishing needs to include work that is both more focused and more incremental. “The real example of this is Homicide Watch: It updates in essentially real time, and you can drop in anytime and see what the trends are.” This sort of in-progress publishing, Chua believes, is essential, “if we want to get all the value of all the reporting we do every day, and also better serve these communities.”
Whether or not they agree on the need to diversify the way news is published, CAR reporters, data editors, and news app developers alike see new technologies changing the way that journalism is both conceptualized and executed.
As much was indicated by the strong impression made on many attendees by Jeff Larson and Chase Davis’s NICAR presentation, “Practical machine learning: Tips, tricks and real-world examples for using machine learning in the newsroom.”
“I’m pretty conservative on this stuff,” says Thibodeaux. “Source reporting leads to the best data reporting.” But after Larson and Davis’s presentation, he says, he can see how “the techniques start to act like sources. The tools let us ask questions that we couldn’t even conceive of before.”
Likewise, Cohen sees significant opportunities in algorithmic document analysis. “Our ability to make sense of messy original records has been revolutionized,” she says.
Whether the broader use of data science tools to do journalism will increase the acceptance of work like Silver’s remains to be seen, but his methods are more likely to be embraced than abandoned. If nothing else, the economic advantages of offloading more work to machines is hard to finesse:
“We don’t have the financial wherewithal to waste the kind of time we waste,” says Cohen. “If we spend a week doing document analysis that could be done by an algorithm, then we deserve to be replaced by machines.”
“We need to reserve the work for things that take human creativity and human insight.”