When to choose data over stories

This is a follow-up to yesterday’s post about Matt Waite’s approach to data-driven journalism. After I posted, Matt sent me an e-mail with some additional information, and I asked him if I could publish it here. He said yes.

Back in “the Paleolithic era of our Web site,” he wrote (meaning 2004), the St. Petersburg (Fla.) Times ran a real estate package that had been a sales phenomenon for the print edition: Soaring Home Prices in the Bay Area (that’s Tampa Bay, fyi).

Producing this package involved writing more than 30 stories in two days across five counties, and Matt pointed out that “be it one person or a staff of people — [this] is a large expenditure of time and effort. A good amount of that can never be replicated by a program — I will never be able to write a Python script that talks to a homeowner and gets good quotes from them and then writes it up in a compelling fashion.”

This is one piece of what a news organization needs to consider (especially now, with reduced staffs): Are the stories worth the time and effort that will be required? Often the answer is yes — we WANT those interviews. But sometimes the answer is no.

“Week after week, month after month, for very small pieces of geography, that type of work isn’t necessary,” Matt wrote. “I’d rather take the core piece of news that people want to know, automate it, and then use my limited staff resources on writing stories of broad interest with nuance and humanity and storytelling and all the stuff you don’t get from a quick paragraph.”

I think this ties in perfectly to the post I wrote last Friday about articles, comments, stories, and conversations. There are stories that will move our hearts, change our thinking, transform us. These are stories with real characters in them, human drama, context, perspective, and explanations that help us understand the world we live in. These stories cannot be automated — just as they are rarely successful when told by inexperienced or time-crunched reporters.

Much of journalism is not like that. We call everything “a story,” but lots of journalism is nothing more than reports — small bundles of facts about a crime, a house sale, a tax increase. Many of these reports CAN be automated. (That’s the idea behind EveryBlock, after all.)

Matt put the choice — story or data? — into practical business terms:

The online business model — where clicks equal cash — pretty much requires that if you’re going to pay a reporter to do a story, that story had better interest the broadest audience possible, or the return on investment is upside-down (obviously, this isn’t the sole determiner of what gets written, but you can’t ignore it completely). The Neighborhood Watch model allows us to cover teeny tiny pieces of geography without concern of cost because, well, it doesn’t cost anything. If I can get a small number of people interested in each of their pieces of geography, and I have lots of pieces of geography, then that will all add up to a large audience.

You need to understand this.

If you can aggregate enough small pieces, efficiently, then you attract a big audience. Big CPMs. Even though no single piece draws mega-traffic, the total can be very, very satisfying.

Matt’s not allowed to give out site numbers, but he was able to say that “most days, no one neighborhood accounts for more than 8 percent of the traffic to the site.”

And while you might be tempted to cherry-pick the neighborhoods you imagine would attract the most site visitors, your instincts are not going to be reliable on this. The payoff from aggregation is that you get everybody clicking in there, and it doesn’t matter that some pieces get very few clicks — the  aggregate works only because it includes everything.

If a news organization wants to remain viable, its people are going to need to make smart decisions about when a story format is the best approach — and when something else will do a better job for the readers.

6 Comments on “When to choose data over stories

  1. Why a choice? Why couldn’t you build the database and then provide small stories that highlight interesting trends, hotspots, or context to the data? I think it’s a bit of a mistake to just let the data sit there and make no attempt to make sense of it for readers who don’t check the data frequently.

    As I see it, stories (small or large) can refocus attention on the data, and the data can drive more stories.

  2. Bryan;

    What part of this page is letting the data “sit there” and not highlighting trends, hotspots or context? The database IS providing those things. A human being isn’t needed to write a story, make a trend graphic, a map and several other lists of context for that very specific neighborhood. What a program can’t do is this, which is based on the same dataset and ran when it launched.

    So, you’re right: don’t choose. Play to strengths. Use your staff to do what they do well and use web apps, programming and data to handle things that don’t require a reporter. Could a reporter write a neighborhood summary? Sure. Could a graphic artist update a trend graphic and a map? Of course. But 189 times in one county alone? Every week? You’d kill yourself if you were assigned that.

    The simple brutal fact of the matter is that no one has the staff they used to, and most places don’t even have the staff they need. So we’re all going to have to get a lot smarter about how we use people — the largest expense in any organization — and how we don’t if we want to do anything other than cut back on the amount of information we’re providing to people. We need to be a lot more creative on how we use means other than assigning a reporter to get information and news to our readers. Those means have limits, and knowing where they are and where the balance between staff driven and data driven lies is critical.

  3. Pingback: News Reporting and Public Records » Blog Archive » Visit from one of the best

  4. Matt,

    I agree with you that the data is presented in an informative way, and provides a lot of information. Perhaps “sit there” wasn’t the right term I was looking for. Rather, a way to use the data in reporting and vice versa so that the traffic to both reporting and databases could be strengthened in the long run.

    I think the story that was produced when the site was launched is a great example, but somebody – database journalist or business reporter or whoever – should be checking back on the database numbers every so often (assuming the database is being updated).

    I would use as an example Chicagocrime.org. It was a great database and a great way to show what crime was happening in chicago. But a few graphs to explain some of the data trends would have been a good way to add some journalistic value to the map data.

    Just my thoughts.

  5. Bryan: Not for nothing, but I think you just made Matt’s point precisely. No one said this is an either/or… I actually think you’re right about ChicagoCrime, and to a much greater extent EveryBlock: Both could use some actual reporting to compliment the data.

Leave a Reply