Nonprofits & Sloppy Data

Suppose I were to tell you the following about Capital Good Fund clients: borrowers who pay off their loan on-time are more likely to build their credit than those who default. I imagine you wouldn’t find that surprising; after all, by virtue of making payments you are bettering your credit, and those who don’t pay us are likely falling behind on other debts as well.

So far so good. Now what about this: borrowers who complete Financial Coaching before taking out the loan are more likely to pay back than those who did not receive Coaching. Again, not too surprising. The skills you learn in Coaching translate to better and more responsible financial habits

Now let’s make it interesting and say that those who complete Financial Coaching AND take out a loan are twice as likely to build their savings compared to those who just do Coaching or take out a loan. On the face of it, this makes sense. Coaching on its own can’t make up for a need to access capital, and a loan is not enough to impart financial skills. But if we dig a little deeper, we come up against a fundamental concept in statistics: selection bias. Simply put, we don’t know if the increase in savings is due to our products and services, or if it’s simply a result of the people who take advantage of both services (Coaching and a loan) being more motivated and, in turn, more likely to build their savings anyways.

Proper Methods for Measuring Impact
In other words, the aforementioned explanation is not causal: we don’t know which variable is affecting which outcome. The only way to answer that question is to run a randomized control trial (RCT), in which you randomly assign people to different groups and track their progress. In this case, we might have one group receive neither Coaching nor a loan; another group receive just Coaching; another receive just a loan; and a final groups receive both Coaching and a loan. Provided that the sample size is big enough, we can then compare how everyone performs and determine what’s really driving the change. And only then can we accurately claim that if the goal is increased savings, it is better than a client use both of our services, as opposed to one or the other (or neither).

Over the past few years I’ve become increasingly concerned by the ways in which nonprofits use data, the ways funders request it, and the way the public digests it. I’m not saying that anyone is being deceptive; every nonprofit leader and employee I’ve ever met has been devoted to the mission and passionate about his or her work. Rather, I’m saying that the use of sloppy data means that the nonprofit sector is unable to analyze its own performance and find ways to increase efficiency, impact, and sustainability.

Let’s go back to my prior example. If we incorrectly believe that, from the point-of-view of our mission, it’s important that all of our clients take advantage of all our services, we will end up wasting a lot of resources pursuing that aim. Not only that, but we’ll likely waste the time of a lot of clients–clients who could have benefited just as much had they only taken out a loan (or only done Coaching).

I fear that the more “data driven” the nonprofit sector becomes in theory, the more misguided it will become in practice. I don’t know what’s worse: lacking data, or coming to faulty conclusions based on incomplete data. Either way, we need to acknowledge the issue.

It’s Complicated
Even seemingly obvious interpretations of data can be incorrect. For instance, at the start of this post I said that graduates of our Financial Coaching make for better borrowers. Well, although our data set is still too small to make a statistically significant interpretation, my analysis of the numbers have shown very little correlation between the two. Why might that be? Behavioral change is hard; not every person who gets Coached is going to suddenly become a perfect borrower. Also, let’s not forget that our clients are very low-income and vulnerable: life happens and they fall behind on debt. This isn’t due to a lack of financial education; it’s due to being poor.

Moreover, the nature of a relationship between Coach and Client is very different from that between Borrower and Lender; the former is collaborative and the latter can be adversarial (the interests of the lender–to get paid–are not necessarily in the interest of the borrower). And finally, human beings are just hard to predict. Someone might apply for a loan with plenty of income and decent credit and then default without making a single payment. Someone else might be tight on cash and have poor credit but end up paying without issue.

I believe that everything we do must be based on data, but that’s not the same as saying all our decisions should rely on statistical certainty. I find it just as useful to manually look through 50 “good” loans and compare them to 50 “bad” loans as I do to run our entire profile through an algorithm. Both are great tools, and both are dangerous unless we understand their limitations.

I’ve argued in a previous post that funders need to provide funding to cover the cost of collecting, interpreting, and reporting data. To that I’d like to add that we, as a sector, need to be more honest about our data and the interpretations we make based on it. Only then can we gain the kind of useful insights that will enable us to accelerate social change and alleviate the suffering and injustice around us.

More posts by us on this topic
Validation of Our Underwriting Algorithm & Approach

The Math Of Social Change

Predicting the Future

(Visited 68 times, 1 visits today)

Capital Good Fund