The state of data in newsrooms: findings and recommendations
The following is a condensed version of a report by Melody Kramer that was written for the Carolina Data Desk, which is part of the UNC School of Media and Journalism‘s Reese News Lab, and funded by the Center for Innovation & Sustainability in Local Media. Names and organizations have been made anonymous for privacy reasons. Kramer interviewed journalists and civic hackers both inside and outside of North Carolina. Many of her recommendations are applicable to local newsrooms, as well as universities, foundations, and non-profits that work with journalists across the country, and are detailed here.
This discovery work gave insights into the challenges and opportunities for journalists and newsrooms who work in data-driven local news environments, and identified gaps and opportunities to better serve a series of diverse audiences whose needs are not currently being met.
Data-driven reporting is time-consuming and labor-intensive. New products and initiatives should reduce the existing mental load for reporters and newsrooms.
Knowing that datasets exists, finding datasets, ensuring the datasets are up-to-date, cleaning the data, and then using the data is time-consuming and difficult, particularly in smaller newsrooms where resources are at a premium. This mimics the findings in a research report written by Wiebke Loosen, Julius Reimer, and Fenja de Silva-Schmidt analyzing data journalism trends across the world. Journalists expressed a new to automate finding out about new, relevant datasets that would help them report and tell stories.
Newsrooms often chase and clean the same data sets, replicating work that could be consolidated.
Newsrooms often file public records requests for the same data sets, replicating work that could be consolidated or pooled, which could increase capacity and allow reporters to use data more effectively. Sometimes organizational barriers exist that prevent people from collaborating due to fear of competition. Shared resources could be developed include:
- A calendar of when data comes out at the state and local level
- How to contact counties or municipalities across the state to request data
- Protocols for talking with state or county or town level clerks to request data
- What datasets have been acquired or published by newsrooms across the state
- What stories have been written about state or local-level datasets
- Shared coding resources that would make cleaning or using data much easier.
“We are ___ (company name). They are ____ (other company name). There’s a competitive framework. It’s hard to collaborate. No, it’s next to impossible for me to collaborate with newspapers. I have to go to other arenas (like radio, data aggregators).”
There is a hunger across local newsrooms for more and better audience data. But a knowledge gap may exist between the people who think about audience data and data-driven journalists in the newsroom.
When data is collected from audience members, that data is not necessarily analyzed or shared with business development. There are opportunities for newsrooms to cultivate audiences and learn more about them through data-driven reporting (and collecting that data.)
“We feel strongly that we have resources that haven’t reached as broad an audience that we would like. …We’ve invested a fair amount of time creating content, but haven’t found a broad audience for it. We’re thinking: What we should do? Who is this for?”
Training needs varies across newsrooms, and appetite for training varies.
Participants all expressed a need to learn more about their audience, and a need to incorporate more audience-data in their own work. But appetite for training varied, and the idea of going somewhere for training was a turn-off for the majority of participants.
The number of data journalists in newsrooms across the state remains small and they don’t necessarily communicate with each other. If there is someone in a newsroom with the title “data reporter” or “data journalist” they are most likely working as the sole data reporter in their newsroom.
Newsrooms and journalists are targeting audiences broadly — but not necessarily creating feedback loops or analyzing audience-driven data they collect.
When asked what they knew about their audiences and whether they were reaching them, however, participants almost unanimously answered that they didn’t know enough (or said that another internal team handled that) and almost all voiced that learning more about their current and potential audience would help their newsroom. If they’re collecting data from their audience — which could be potentially lucrative for advertisers — it’s not shared with their audience insight team, and it’s not necessarily used for creating future stories.
“We know it’s small but loyal – we have audience engagement people, but I’m not part of that team. I’d love to know a lot more about that. We are trying to create a following of people who care.”
Every newsroom has data-driven stories that they’d like to reach a wider audience — both the general public and their peers.
“We are always trying to find out more about audience. We regularly send out reader surveys. We generally get several thousand responses. But we don’t do anything super scientific with them. It’s more general. We’ve done surveys on dating life, restaurants, we did one on salaries and job stuff. We use that and write stories based on it, with caveats that it’s not scientific.”
“We’ve invested a fair amount of time creating content, but haven’t found a broad audience for it. We’re thinking: What we should do? Who is this for?”
Though many organizations collect audience data, only a few feed that data back into future stories.
There currently exists a gap between the data that organizations do collect, and their ability to use that data to shape future stories. Newsrooms cited limited capacity and time for doing anything more with the data they do collect, but they are leaving potential relationships with their audience on the table for not creating more feedback loops for their readers.
Newsrooms are looking for ways to automate some of their more tedious tasks, so that they can focus on more innovative work.
Many of the reporters I spoke with spend part of their day doing tasks that could and should be automated so that they can focus on reporting. There are models for this at national news organization: The Quartz Bot Lab creates tools that automate parts of reporters’ workflows so they can reduce the amount of manual work they do to check datasets and websites. Buzzfeed built a bot for Facebook Messenger that brings in tips and crowdsourced material that reporters can use. Reuters built a bot that scans Twitter for breaking news stories. The Washington Post built a tool called Heliograph that generated over 500 stories for the election across house, state, and gubernatorial races.
“You always have a balancing act of updating stories that you’ve reported, being on-site for developing stories,and having data on a specialized beat (prison, education, health), Increasing capacity is crucial and then have people in more specialized roles is crucial. My role tends to be this really mashed up weird thing where I’m sort of a data librarian, sort of a document librarian, sort of a public records request, interactive dev,and enterprise reporter. We should have one person that keeps data in one place and trains people.
Civic Technology Findings
Civic technologists are advocates of open data and natural partners in this space.
In 2014, WAMU — the public radio station in Washington D.C. — partnered with the local civic hacking group Code for DC to create their election night map. When I chatted with people from both the station and the civic hacking group about this project, it was clear both sides benefited greatly from working with the other. WAMU added capacity that they otherwise wouldn’t have had, and had the news chops to deliver a package that resonated with their audiences on election night; the civic hackers were exposed to a newsroom, reporters, and received a much bigger audience for their work than they typically received.
Civic technologists know how to sift through and clean data. They benefit from a journalist’s ability to know what story to tell, how to obtain an audience, and how to have the most impact.
Affordable housing, election data, crime data, and workforce programs data were subjects mentioned by both civic technologists and news organizations. Holding consortiums on these topics and bringing together civic technologists, open data advocates, local government officials, journalists, and data scientists would benefit the Carolina Data Desk.
Conclusion and next steps:
There are opportunities for non-profits, foundations and universities to better serve data journalists. They can:
- Become conveners and facilitators that helps local newsrooms increase capacity. Universities can serve a number of roles but the most important one may be this: acting as a bridge between newsrooms across a geographic area and as the glue through which they can work together. Universities can house datasets, create shared resources, and keep track of what has been published and acquired.
- See part of their roles as empowering journalists and supporting journalists through trainings (for both them and others within their organizations), convenings, partnerships, and connecting them to others who are working on similar projects.
- Ensure that there is continuity beyond turnover in newsrooms. Keeping track of datasets and how to access those datasets, as well as projects that have used the data, would be an asset to many organizations working in this space.
- Can and should create a membership program aimed at newsrooms who use their services. Newsrooms with a financial stake in the future of these collaborative efforts will be more likely integrate it into their workflows.
- Not lose focus by goals unrelated to original mission. It’s really easy to start scope creep and want to do everything. This often leads to distraction and not being able to accomplish anything. Despite the plethora of ideas in this document, organizations should determine which ideas best fit their mission, undertake those, measure the success, and add and subtract as necessary.
- Have a project and newsrooms in mind to bring in beta testers. Similar to Hearken’s approach, organizations should start to test out some of their assumptions around data sharing and collaboration and then write these up as case studies that will act to strengthen the model, bring in new partners or customers, and act as a guide for other institutions that want to start similar programs.
- Create communication channels with others in this space to continually share best practices. Now that three states are creating data repositories, there is an opportunity to share and collect information that will benefit the three data projects and help others. For example, this sign-up sheet from The Bureau Local can certainly be localized and replicated.
- Create a roundtable discussion around a topic of interest to journalists, civic hackers, and industry to engage with topics that may lead to deeper engagement and collaboration. Ideas include: housing, the 2020 census or election cycle, and/or environmental reporting. There are opportunities for partnerships here.