Data journalism and information visualization is a burgeoning field. Every week, Between the Spreadsheets will analyze, interrogate, and explore emerging work in this area. Between the Spreadsheets is brought to you by CJR and Columbia’s Tow Center for Digital Journalism.
Six-year New York Times reporter Louise Story’s recent investigation into business subsidies used data investigations techniques to find that state and federal governments gave companies over $80 billion a year to attract them to a given area. The result of her work is a three-part series that combines investigative reporting with technical finesse. The written version of this story could have very well been the sum total of Story’s exhaustive and thorough reporting, but by taking the lead on turning it into an online database, she presided over the perfect marriage of big data and public service journalism.
When Story started her investigation, she thought finding the data she needed wouldn’t be too hard. Though she found a trove of figures that gave her a general picture of the situation in the numerous sources she searched, the information she was really interested in—how much this was costing taxpayers—wasn’t there. After talking to numerous researchers and experts, she realized that the resource simply didn’t exist. She recalled being told over and over again that such a database would be very useful, but almost impossible to create. Her solution: she compiled her own.
“I thought, ‘I’ll make the list myself and track them down and ask for the cost figures,’” said Story.
She looked at the business incentive programs each state offered, compiling her reporting into an Excel spreadsheet. She kept hard copies of her findings in one of the four files that now live on her desk, just to be certain a physical copy also existed.
“There were three to six agencies per state to go through,” said Story. In addition, Story discovered that in some states, business incentives were not solely handled by their economic development offices. So she had to check with any other agency or body involved in these types of programs. In all, it took Story nearly five months to compile the data on 1,874 state programs.
The hundreds of hours she spent reporting and interviewing returned the same message: that she was creating a resource not otherwise available. So early on in the reporting process, Story wanted her findings to appear as a searchable dataset, open to the public.
“Journalists should bring out information that doesn’t exist,” she said.
The interactive team then took Story’s spreadsheets and turned them into the online database that accompanies the series. Tiffany Fehr, interface developer at the Times, worked closely with Story to get the data ready for its Web iteration and build the interactive:
“The challenge was to show the breadth of the data but also how flimsy it is,” Fehr said. Story’s reporting showed that there were huge gaps in the way states and government agencies recorded and monitored the information.
Fehr built the database in mySQL, an open-source database. The team wanted the database to be searchable, so when designing it Fehr and her colleagues had to decide how to arrange the search functions. They settled on providing different entry points into the data by making it possible to search by geographical location so that a state by state comparison is possible.
Another element in the database is its “$100 Million Club” feature, a searchable list of the 48 companies that received more than $100 million in state grants since 2007.
Fehr and the interactive team are used to working across the desks of the Times, but this was the first project Fehr said she worked on in which a desk took such a hands-on involvement in the data checking and preparation process.