Friday, June 29, 2012

Disruption in the nonprofit sector, or, why the next CEO of Guidestar has a great opportunity

Note: there are bunch of relevant disclosures at the bottom of this post.

On Monday, "philanthropy wonk" (and PACS Visiting Scholar) Lucy Bernholz blogged about a proposal in the Knight News Challenge which is aiming to digitize and make public IRS 990 tax forms. These are essentially the only public records on charities, and they cover basics like, "How much do you spend on programs and fundraising? How much was the CEO paid?" (CharityNavigator, the organization everyone thinks I work for when I mention GiveWell, uses the data from these 990 forms to rate the financial performance of charities.) Lucy and I had a Twitter exchange about her post that got me thinking, and I wanted to expand on it a little bit.

A note about the proposal in question: Carl Malamud, who was apparently responsible for getting the SEC to provide corporate filings free online, proposes a $500,000 or $600,000 budget for a project to "Put 10 years of IRS Form 990... online in bulk, [and] extract 75 million fields of data." Lucy commented, "Look at the price tag - $600K - amazing to think what could be done for such little money."

What I initially said to Lucy was, "couldn't GuideStar just do this overnight?" GuideStar is the current hub for nonprofit financial information; they digitize hundreds of thousands of 990s a year and serve as the data backbone for a variety of different initiatives in the sector. Lucy responded by pointing out that (a) they haven't yet, and (b) it would be counter to their business model to do so.

Lucy's comment made me wonder what exactly GuideStar's business model is, so I went and found their audited financial statements (PDF; FY 2010 is the most recent available). The basic story, at least from 2010, is that they raised $7.4M from selling their data and products and $2.3M from charitable contributions of different stripes (while running an operating deficit of $1.6M). (It would be nice if we had more of a breakdown of where their earned revenue is coming from, but they don't have to be more public about their budget and revenue, so, like most charities, they aren't.) Of their total spending of $11.3M, $5.6M went to personnel, $1M went to digitizing 990s, $800K went to other tech, and the remaining ~$4M covered everything else. If you, as a normal individual, go to the GuideStar website, you can get copies of Form 990 for a given charity for free, but getting the actual data that GuideStar digitized (i.e. made machine readable) will cost you. (On reflection, this is actually a bit weird. The core way GuideStar is used by individuals, to look at PDFs of 990s, doesn't even required the "digitization" that they do, which turns the PDFs into numbers in a database.)

At this point I started thinking of the recent New Yorker profile of Clayton Christensen (it's great, you should read it, but it's gated). Christensen, who I had somehow never heard of before, basically invented the idea of "disrupting" an industry. I normally find these kinds of self-help business ideas inane, but Christensen seems to have a decent answer to the standard "if you know what's going to happen to businesses, why aren't you running the best hedge fund in the world?" objection. (That would be the "I'm getting companies to pay me a lot of money to tell them what to do instead" response.) Christensen's basic idea is that disruption happens as low-cost, scrappy competitors move upscale over time to take on the core profitable businesses of successful incumbent companies.

If GuideStar spends $1M a year digitizing 990 data and sells it for $7.4M, and an upstart can digitize the same data for less than half the cost, GuideStar looks ripe for disruption.

Actually, though, the case is both stronger and weaker than this statement makes it seem. The case is weaker in the sense that GuideStar is probably doing a better job than the upstart could, at least at first. GuideStar claims to enter over 100 million fields a year with 99.9% accuracy for their digitization (by using human double entry), whereas Malamud only plans to capture 75 million fields total from the past 10 years of 990s, and Captricity, the company he plans to use, only claims "over 99% accuracy." Capturing less data with less accuracy at half the cost doesn't seem like such an earth-shattering proposition. 

The case for disruption by the upstart is stronger than this makes it seem because this is exactly how disruption works: the upstart does a worse, but adequate, job at a lower price point and gradually expands to challenge the higher value-added products and services of the incumbent. GuideStar has built an array of products and services for their customers on top of the digitized data, and no doubt some of those customers would be perfectly willing to continue to pay their current rates to get that extra half a percent of accuracy, and to cover the value of those services. But some of them inevitably won't, especially when Malamud releases his data for free publicly (as he plans to), and the low-cost data digitization services he plans to use are only going to get more accurate over time. Plus, after doing the first batch of digitization, he plans to get the IRS to start doing it itself.

That brings me to a key point that I haven't mentioned: every single player in this story is a non-profit. GuideStar, the Knight Foundation (which funds the Knight News Challenge), and Malamud's Public.Resource.Org are all non-profit. Shouldn't they all be collaborating, rather than competing, since they all care about making data available to the public? (Their nonprofit status might not matter if Malamud can't get the funding he wants for the project, but, frankly, that doesn't seem very likely to me, and there seems to be plenty of room for a for-profit competitor to disrupt GuideStar if that happens anyway.)

In March, GuideStar announced that their President & CEO of 10 years will be stepping down by the end of the year. His successor faces an interesting challenge and a great opportunity. GuideStar's business model (from what I can see) has historically relied on bundling private digitization of 990 data with a bunch of value-added services for nonprofits, foundations, and tax professionals. That doesn't look like it's going to be possible any more.

So rather than play defense, GuideStar's new CEO should decouple the digitization business. The new CEO should start investigating Captricity and see if that's good enough for what GuideStar needs, but more importantly recognize that GuideStar will live or die on the value-adds. Selling the data doesn't look like it's going to be possible for much longer, so go ahead and make it free to the public. Some of the open data foundation-types will recognize the value of this and try to make up for the lost revenue, especially when GuideStar comes off looking like a hero of their movement. (Plus, there's the fact that this will actually be hugely beneficial for GuideStar's mission. It may look like no one wants the data right now, but the track record of open data seems to indicate that most of the value materialized after it became public. c.f. this awesome example from the discovery of cholera.)

My unsolicited advice here could be totally misguided. If GuideStar is making $5 million of their $7.4 million in earned revenue from selling their data to academic researchers, "focusing on monetizing the higher-value-add services" isn't really going to keep them going at their current budget (but maybe nothing is). Or, perhaps they've tried to "focus on the higher value-added" before, and couldn't get adequate philanthropic funding to cover the data collection. (This certainly seems possible; one can imagine a horror story where a new philanthropic funder comes along and kills the paying market every couple of years, but then refuses to continue to subsidize public production of the data.)

But my take is actually slightly more optimistic. I like to think that GuideStar's recent actions, especially their acquisition last year of Philanthropedia and Social Actions, suggest that they're way ahead of me on this, and are hard at work on new products and strategies to continue to add value over and above the digitization of the 990 data. I'm looking forward to the day, hopefully not too far off, when they make the digitized 990 data free to the public.


Lots of disclosures on this one: GiveWell, my employer, participates in TakeAction@GuideStar. I also know and like a couple members of the GuideStar Board of Directors, and I've consulted for one of them on a project unrelated to GuideStar. As usual, this post represents me, not GiveWell (or anyone else).


  1. Great post and I think you are right. I do think Guidestar is committed to innovating, but making their data public would be a tough trigger to pull. The problem with basing your business decisions on predicting the path of upstarts is that many of them fail, and sometimes by the time it's clear they will succeed it's too late to do much about it.

  2. Alexander
    Thanks for this - I agree. The challenge for every established organization (business or NPO) in the digital era is to even imagine what they would become if/when their "raw material" (in this case 990 data) were abundant, free, and easily re-used.

    Some will become the digitizers themselves (possibly, the sensemakers, the new product developers, the interface creators, or any other options.

    The business model disruption is visible in everything from the recording industry to makers of car GPS systems. NASA data becomes available, companies create one-trick tools (GPS systems for cars) and then all of a sudden a search company (google) and what used to be a computer company-come-music-company-come-phone company (Apple) puts "talking maps" on every phone and --- time for a new business model.

    But you point to a much more interesting question (the one @Robreich and I are looking at Stanford in the #recodegood project) - what is the real set of relationships between public sector agencies and public data (NASA data or 990s), nonprofits (publicly subsidized to do good), commercial businesses, and emerging hybrid businesses? Where is the public good in this equation? Since the raw material (digital data) operate as fundamentally different economic objects than analog objects (see Yochai Benkler and Paul Bremer for great wisdom on this) do we need to think about new incentives? New forms of governance? Or do we assume that the analog economics of sheep grazing space that helped define the commons which begot the nonprofit sector is going to work in this new economic frame?

    Good work Alexander. I completely agree on the opportunity in front of all of us.

  3. Dear Alexander,

    Thanks for your analysis and comments. Although your math is a little fuzzy, we share the common goal of improving the quality and access of nonprofit data. Since we’re all making wishes, here are a few of mine:

    • I wish the IRS provided all nonprofit data electronically, but they don’t. We hope they will eventually.

    • I wish the IRS documents could be OCR’ed, but they aren’t very friendly to OCR efforts.

    • I wish the IRS followed the lead of the SEC and funded someone like Edgar Online to operate a free service like Edgar, but they don’t.

    • I wish the foundation community supported making all nonprofit data free electronically, but—even though they have been very generous to GuideStar—they prefer sustainable or commercial approaches.

    So GuideStar has been left to our own creative solutions to support the nonprofit sector’s need for quality data and pursuit of our mission. Currently, of our ten million users, 98 percent are able to access nonprofit data and services at no charge. Over 150 universities and colleges are using GuideStar data in their classrooms for thousands of students. Those who want to dig really deep into the data or use our sophisticated tools or services pay a fee; for the most part they tend to be professionals. Those fees are now paying for most of the costs of operating GuideStar, but as you mention, not enough to cover all our costs – or do everything we’d like to do.

    By the way, why the fixation on the 990? It isn’t timely, it isn’t always accurate, and it doesn’t tell us about impact and effectiveness. Increasingly we’re focused on the GuideStar Exchange, which is our effort at collecting information directly from nonprofits and collecting data that can help donors make better and more confident decisions.

    What would GuideStar do with $600,000? First, we’d invest it in something that is scalable, sustainable and accessible. Too much foundation money goes to one-offs. Second, 75 million fields of data isn’t very much data over a 10 year span , so I’d be looking for something with more impact. Instead of replicating efforts to digitize Forms 990 data, I would like to accelerate GuideStar’s APIs, not just for distribution of Forms 990 data, but also for nonprofit-supplied data collected through the GuideStar Exchange program and through our DonorEdge platform for community foundations, and also through its partnerships with Great Nonprofits, GiveWell, Philanthropedia, RootCause, and others. More than simply digitizing and providing Forms 990 data, we’d like to make the best, richest data set about the nonprofit sector available in a sustainable way, so that the resources we build today will last for years to come.

    Bob Ottenhoff, president and CEO, GuideStar

  4. [Disclosures: I'm on the board of GiveWell, GuideStar is a client, Lucy is a friend]

    Bob's critique of the value of digitizing 990s and making the data freely accessible could be more direct.

    It's not just that 990s don't contain data on impact and effectiveness (one of the reasons that GiveWell exists). It's not just that the information "isn't always accurate" as Bob says. In fact, there is substantial reason to believe that the data is meaningless because there are such poor standards for nonprofit accounting, and the fact that nonprofits are almost never audited.

    You can easily find references to work that has shown that a large percentage of nonprofits claim to have no fundraising expenses. Another study from a few years ago found that roughly 50% of 990s surveyed were filled out incorrectly.

    In sum, the most likely path to social value from digitizing 990s is recognition that 990 data is useless and we need different standards for nonprofit reporting.

    The more likely scenario is that making this data set easily accessible and usable will hide this reality for a decade or more as people just start running with the only data set that is available. We've seen this show before in philanthropy.

    [Finally in Lucy's defense, I'm posting this while she is on vacation so she doesn't have a fair chance to respond]

  5. Thanks for giving me time to respond (smile)

    Here's where my thinking differs about the value of the 990 data. Both Tim and Bob argue that the 990 data are so deficient as to not warrant such attention. I actually believe opening up the 990 data will do two things - 1) shine light on their deficiency in such a way that real efforts to improve the 990 will develop and 2) feed the data ecosystem in such a way that more and better alternatives will be unleashed.

    It's not the 990 data themselves, in other words, but the process of opening up this raw material, that matters. Doing so will make it possible for more services that add real value, such as GiveWell and Guidestar, to be born. It will make it easier and cheaper to develop analyses and services and measurements higher up the data food chain than can be developed now when so many resources must go into simply getting this core material out in a manageable and somewhat useful way. It may well open up enough interest from "outside the philanthropy data beltway" that pressure or market forces can be applied to improve the 990 itself. It will make clearer the need for an interoperable architecture, standard coding, shareable fields, etc that facilitate field-wide and possibly industry-wide comparisons.

    Doing constant end-runs around the 990 are costly and limited in scope. Working off an open data set of 990 information will repurpose compliance data for analytic and innovative goals. It will also meet the first rung of interest for most donors "data engagement ladder" - which is about organizational legitimacy.

    See this new post from Brad Smith about "Starts and stops" for some other good ideas on this subject -