Posted on October 24, 2008
When to choose data over stories
This is a follow-up to yesterday’s post about Matt Waite’s approach to data-driven journalism. After I posted, Matt sent me an e-mail with some additional information, and I asked him if I could publish it here. He said yes.
Back in “the Paleolithic era of our Web site,” he wrote (meaning 2004), the St. Petersburg (Fla.) Times ran a real estate package that had been a sales phenomenon for the print edition: Soaring Home Prices in the Bay Area (that’s Tampa Bay, fyi).
Producing this package involved writing more than 30 stories in two days across five counties, and Matt pointed out that “be it one person or a staff of people — [this] is a large expenditure of time and effort. A good amount of that can never be replicated by a program — I will never be able to write a Python script that talks to a homeowner and gets good quotes from them and then writes it up in a compelling fashion.”
This is one piece of what a news organization needs to consider (especially now, with reduced staffs): Are the stories worth the time and effort that will be required? Often the answer is yes — we WANT those interviews. But sometimes the answer is no.
“Week after week, month after month, for very small pieces of geography, that type of work isn’t necessary,” Matt wrote. “I’d rather take the core piece of news that people want to know, automate it, and then use my limited staff resources on writing stories of broad interest with nuance and humanity and storytelling and all the stuff you don’t get from a quick paragraph.”
I think this ties in perfectly to the post I wrote last Friday about articles, comments, stories, and conversations. There are stories that will move our hearts, change our thinking, transform us. These are stories with real characters in them, human drama, context, perspective, and explanations that help us understand the world we live in. These stories cannot be automated — just as they are rarely successful when told by inexperienced or time-crunched reporters.
Much of journalism is not like that. We call everything “a story,” but lots of journalism is nothing more than reports — small bundles of facts about a crime, a house sale, a tax increase. Many of these reports CAN be automated. (That’s the idea behind EveryBlock, after all.)
Matt put the choice — story or data? — into practical business terms:
The online business model — where clicks equal cash — pretty much requires that if you’re going to pay a reporter to do a story, that story had better interest the broadest audience possible, or the return on investment is upside-down (obviously, this isn’t the sole determiner of what gets written, but you can’t ignore it completely). The Neighborhood Watch model allows us to cover teeny tiny pieces of geography without concern of cost because, well, it doesn’t cost anything. If I can get a small number of people interested in each of their pieces of geography, and I have lots of pieces of geography, then that will all add up to a large audience.
You need to understand this.
If you can aggregate enough small pieces, efficiently, then you attract a big audience. Big CPMs. Even though no single piece draws mega-traffic, the total can be very, very satisfying.
Matt’s not allowed to give out site numbers, but he was able to say that “most days, no one neighborhood accounts for more than 8 percent of the traffic to the site.”
And while you might be tempted to cherry-pick the neighborhoods you imagine would attract the most site visitors, your instincts are not going to be reliable on this. The payoff from aggregation is that you get everybody clicking in there, and it doesn’t matter that some pieces get very few clicks — the aggregate works only because it includes everything.
If a news organization wants to remain viable, its people are going to need to make smart decisions about when a story format is the best approach — and when something else will do a better job for the readers.