Inside the NYTimes: Finding Order in a Thicket of Nonprofit Data

From a Times Insider column by Tiff Fehr headlined “Finding Order in a Thicket of Nonprofit Data”:

Data journalists such as myself struggle with one ever-present challenge: working with PDFs, the portable document formats unleashed by Adobe in the 1990s that are now ubiquitous barnacles on our digital lives.

Most of us have encountered PDFs when sending an application, checking a utility statement or perusing an online menu. And while converting a document into a PDF can preserve how it looks — resolving differences between devices, fonts or layouts — there are trade-offs to achieving portability.

In a “good” PDF, readers can select and copy text, click links or even fill out a form. But more often we deal with “bad” PDFs, those that are essentially photos of the original document. The text and data within the PDF are inaccessible to readers unless they use special extraction tools.

My nerdier peers on the Interactive News team at The New York Times and I often hear from reporters who are encountering “bad” PDFs while gathering source material. Frequently, their messages include variations on an ominous phrase: “Here’s the problem. …”

My colleague David A. Fahrenthold, an investigative reporter in the Washington bureau who covers nonprofits, sent me one notable here’s-the-problem email last fall. He was investigating the finances of a group of five tax-exempt political nonprofits called 527s, named after the section of the tax code that governs them, and was having trouble extracting data from a handful of PDFs from the Internal Revenue Service. The nonprofits in question were alleged to have hidden self-serving business dealings by dicing their I.R.S. filings into thousands of individual expenses — far too many for one donor, tax auditor or journalist to easily evaluate.

We needed to dig into the nonprofits’ I.R.S. filings to assess the patterns. But stuck within 15,851 cumulative pages of “bad” PDFs (we counted) was data we could use to reconstruct their financial claims.

We recently published an investigation into these five groups. Our analysis of their public information revealed patterns. Using robocalls, the group of nonprofits raised $89 million in small donations from unsuspecting donors who thought they were contributing to political funds supporting veterans and police officers. But most of the small donations were funneled into yet more fund-raising, and 3 percent — $2.8 million — was paid to companies owned by the three political consultants behind the nonprofits.

To reach those conclusions, David and I first had to download all of the available PDFs for these nonprofits from the I.R.S. and double- and triple-check them. I wrote code to analyze each PDF and extract every reported donation and expense — more than 136,000 transactions over nine years.

After untangling the transactions, the task was, relatively, simple. The key to unwinding the 527 groups’ questionable practices came down to pivot tables, a feature that is most likely used by both amateurs and professional spreadsheet aficionados. A pivot table is a common tool to organize a table of information by a theme, like all expenditures summed by year or all donations grouped by the donor’s home state.

For this investigation, we organized the expenses by recipient, to see which companies received the most from the nonprofits. Separately, David researched the companies directly connected to one or more of the nonprofit founders. (In statements, the four nonprofits still operating denied wrongdoing and said they were helping candidates indirectly by raising grass-roots issues with voters.)

These types of investigations are not the only instances when Times reporters run into challenging financial data.

From the multiyear investigation into the Trump family’s business (and taxes) to everyday stories about people who are caught fudging aid programs or tax credits, our reporting and data journalism practices continue to resemble forensic accounting, a specialty area of accounting that unravels financial crimes like fraud, embezzlement or Ponzi schemes.

Banking scandals, tax evasion, official inquiries and bankruptcies typically include financial details with mind-boggling figures. The newsroom’s financial and accounting acumen becomes more complex and intensive when we can obtain the raw data behind those numbers, and make sense of them as we did with the 527 filings.

The skill set that data journalists need to accurately report on opaque finances is expanding as the availability of public data evolves. The number of data journalists at The Times is growing as well, with dozens involved in data journalism across many desks; Graphics, Investigations, Climate and Elections, to name a few.

My colleagues on the Interactive News team and I continually assess where and when new techniques are a responsible fit for the newsroom’s needs, and lie in wait for the next chance to pair our growing accounting skills with investigative reporting.

Tiff Fehr is a staff engineer and project editor within the Interactive News Technology (INT) team, a group of technologists embedded in the newsroom of The New York Times. She focuses on custom software development for the newsroom, in addition to data journalism projects like The Times’s Covid-19 data collection and its public data set. In 2021, she was part of the data team that won the Pulitzer Prize for public service around NYT’s coverage of the Covid-19 pandemic.

Speak Your Mind