Note: there are bunch of relevant disclosures at the bottom of this post.
On Monday, "philanthropy wonk" (and PACS Visiting Scholar) Lucy Bernholz blogged about a proposal in the Knight News Challenge which is aiming to digitize and make public IRS 990 tax forms. These are essentially the only public records on charities, and they cover basics like, "How much do you spend on programs and fundraising? How much was the CEO paid?" (CharityNavigator, the organization everyone thinks I work for when I mention GiveWell, uses the data from these 990 forms to rate the financial performance of charities.) Lucy and I had a Twitter exchange about her post that got me thinking, and I wanted to expand on it a little bit.
A note about the proposal in question: Carl Malamud, who was apparently responsible for getting the SEC to provide corporate filings free online, proposes a $500,000 or $600,000 budget for a project to "Put 10 years of IRS Form 990... online in bulk, [and] extract 75 million fields of data." Lucy commented, "Look at the price tag - $600K - amazing to think what could be done for such little money."
What I initially said to Lucy was, "couldn't GuideStar just do this overnight?" GuideStar is the current hub for nonprofit financial information; they digitize hundreds of thousands of 990s a year and serve as the data backbone for a variety of different initiatives in the sector. Lucy responded by pointing out that (a) they haven't yet, and (b) it would be counter to their business model to do so.
Lucy's comment made me wonder what exactly GuideStar's business model is, so I went and found their audited financial statements (PDF; FY 2010 is the most recent available). The basic story, at least from 2010, is that they raised $7.4M from selling their data and products and $2.3M from charitable contributions of different stripes (while running an operating deficit of $1.6M). (It would be nice if we had more of a breakdown of where their earned revenue is coming from, but they don't have to be more public about their budget and revenue, so, like most charities, they aren't.) Of their total spending of $11.3M, $5.6M went to personnel, $1M went to digitizing 990s, $800K went to other tech, and the remaining ~$4M covered everything else. If you, as a normal individual, go to the GuideStar website, you can get copies of Form 990 for a given charity for free, but getting the actual data that GuideStar digitized (i.e. made machine readable) will cost you. (On reflection, this is actually a bit weird. The core way GuideStar is used by individuals, to look at PDFs of 990s, doesn't even required the "digitization" that they do, which turns the PDFs into numbers in a database.)
At this point I started thinking of the recent New Yorker profile of Clayton Christensen (it's great, you should read it, but it's gated). Christensen, who I had somehow never heard of before, basically invented the idea of "disrupting" an industry. I normally find these kinds of self-help business ideas inane, but Christensen seems to have a decent answer to the standard "if you know what's going to happen to businesses, why aren't you running the best hedge fund in the world?" objection. (That would be the "I'm getting companies to pay me a lot of money to tell them what to do instead" response.) Christensen's basic idea is that disruption happens as low-cost, scrappy competitors move upscale over time to take on the core profitable businesses of successful incumbent companies.
If GuideStar spends $1M a year digitizing 990 data and sells it for $7.4M, and an upstart can digitize the same data for less than half the cost, GuideStar looks ripe for disruption.
Actually, though, the case is both stronger and weaker than this statement makes it seem. The case is weaker in the sense that GuideStar is probably doing a better job than the upstart could, at least at first. GuideStar claims to enter over 100 million fields a year with 99.9% accuracy for their digitization (by using human double entry), whereas Malamud only plans to capture 75 million fields total from the past 10 years of 990s, and Captricity, the company he plans to use, only claims "over 99% accuracy." Capturing less data with less accuracy at half the cost doesn't seem like such an earth-shattering proposition.
The case for disruption by the upstart is stronger than this makes it seem because this is exactly how disruption works: the upstart does a worse, but adequate, job at a lower price point and gradually expands to challenge the higher value-added products and services of the incumbent. GuideStar has built an array of products and services for their customers on top of the digitized data, and no doubt some of those customers would be perfectly willing to continue to pay their current rates to get that extra half a percent of accuracy, and to cover the value of those services. But some of them inevitably won't, especially when Malamud releases his data for free publicly (as he plans to), and the low-cost data digitization services he plans to use are only going to get more accurate over time. Plus, after doing the first batch of digitization, he plans to get the IRS to start doing it itself.
That brings me to a key point that I haven't mentioned: every single player in this story is a non-profit. GuideStar, the Knight Foundation (which funds the Knight News Challenge), and Malamud's Public.Resource.Org are all non-profit. Shouldn't they all be collaborating, rather than competing, since they all care about making data available to the public? (Their nonprofit status might not matter if Malamud can't get the funding he wants for the project, but, frankly, that doesn't seem very likely to me, and there seems to be plenty of room for a for-profit competitor to disrupt GuideStar if that happens anyway.)
In March, GuideStar announced that their President & CEO of 10 years will be stepping down by the end of the year. His successor faces an interesting challenge and a great opportunity. GuideStar's business model (from what I can see) has historically relied on bundling private digitization of 990 data with a bunch of value-added services for nonprofits, foundations, and tax professionals. That doesn't look like it's going to be possible any more.
So rather than play defense, GuideStar's new CEO should decouple the digitization business. The new CEO should start investigating Captricity and see if that's good enough for what GuideStar needs, but more importantly recognize that GuideStar will live or die on the value-adds. Selling the data doesn't look like it's going to be possible for much longer, so go ahead and make it free to the public. Some of the open data foundation-types will recognize the value of this and try to make up for the lost revenue, especially when GuideStar comes off looking like a hero of their movement. (Plus, there's the fact that this will actually be hugely beneficial for GuideStar's mission. It may look like no one wants the data right now, but the track record of open data seems to indicate that most of the value materialized after it became public. c.f. this awesome example from the discovery of cholera.)
My unsolicited advice here could be totally misguided. If GuideStar is making $5 million of their $7.4 million in earned revenue from selling their data to academic researchers, "focusing on monetizing the higher-value-add services" isn't really going to keep them going at their current budget (but maybe nothing is). Or, perhaps they've tried to "focus on the higher value-added" before, and couldn't get adequate philanthropic funding to cover the data collection. (This certainly seems possible; one can imagine a horror story where a new philanthropic funder comes along and kills the paying market every couple of years, but then refuses to continue to subsidize public production of the data.)
But my take is actually slightly more optimistic. I like to think that GuideStar's recent actions, especially their acquisition last year of Philanthropedia and Social Actions, suggest that they're way ahead of me on this, and are hard at work on new products and strategies to continue to add value over and above the digitization of the 990 data. I'm looking forward to the day, hopefully not too far off, when they make the digitized 990 data free to the public.
Lots of disclosures on this one: GiveWell, my employer, participates in TakeAction@GuideStar. I also know and like a couple members of the GuideStar Board of Directors, and I've consulted for one of them on a project unrelated to GuideStar. As usual, this post represents me, not GiveWell (or anyone else).