Empiricists Corner: What the First Social Impact Bond Won’t Tell Us


Magazine article

This article first published in the Stanford Social Innovation Review.

Social impact bonds (SIBs) are a high-profile innovation in funding public services. The pilot SIB in Peterborough, UK, which aims to reduce recidivism, has been widely watched and — despite not yet producing results — already widely emulated.

Given the international interest in SIBs and similar payment by results (pay-for-success) schemes, it is important to determine whether the Peterborough SIB works. The Ministry of Justice describes the programme’s evaluation method as “the Rolls Royce of evaluation.” However, Professor Sheila Bird of Cambridge University and the UK Medical Research Council says: “[It] might well be a brilliant success; it might achieve little. But we aren’t going to know either way.”

This article examines three aspects of determining whether the SIB works.

The first is straightforward: whether the investors should be repaid. Determining this will be easy, because it depends solely on the re-offending rate and the contractual terms – both of which will be clear.

Second, whether the intervention itself works to reduce re-offending – a central question. Determining this will be more difficult, because this first SIB is using a variety of interventions – only some of which have been evaluated rigorously and the combination has never been evaluated.

The issue is attribution: figuring out whether the re-offending rate amongst the Peterborough prisoners has anything to do with the charities’ work which the bond funds. Both sides agree that the way to see what the charities have achieved is to compare:

  1. The one-year re-offending rates of men with whom the charities work.
  2. The one-year re-offending rates of a group of similar men with whom the charities haven’t worked. This “control group” screens out effects of, say, changes in society, the law, or sentencing procedures.

It is essential that the “treatment group” and control group be effectively identical beforehand; if they are, the sole difference between them is the programme, which alone must account for differences in re-offending rates between the groups. Bird would have liked the treatment group and control group to have been selected at random to ensure that the groups were effectively identical. But this is not what is happening.

Social Finance says it was impossible: within the prison, the programme is advertised and open to anybody whose sentence is a year or less. Prisoners are used to – and exasperated by – being apparently arbitrarily excluded from things, and neither Social Finance, the nonprofit company that invented social impact bonds and is running the Peterborough pilot, nor the prison governor wanted this programme to generate ill-will in that way. Social Finance says that its “investors wouldn’t tolerate excluding some people.” Sheila’s view is that random selection inside prisons (as outside them) is not only possible, but also pretty common.

If randomising prisoners was not possible, the next best option would have been randomising prisons: in other words, several randomly selected prisons would get the programme while others would not, and the re-offending rates of their populations would be compared. Social Finance says that this was not possible either, because the Ministry of Justice would never have allowed a pilot in several prisons at once.

Interestingly, Peterborough prison was not chosen at random, but rather because the prison governor was willing to engage. As Bird remarks, that may indicate an unusual trait in the governor, which itself may influence the results. It is not impossible that a prison governor willing to take on this innovative project is unusually progressive in other respects too: perhaps Peterborough prison offers other unique programmes that could skew the results.

To construct a control group, the bond evaluation uses Propensity Score Matching (PSM), a system often used when samples cannot be randomised. With PSM, you start by figuring out what indicators have historically correlated with eligibility for the treatment (propensity to be eligible). In this case, prisoners at institutions other than Peterborough who have the same “propensity scores” as the treatment group serve as a control group. Social Finance is doing an unusually elaborate PSM by having about ten “control” prisoners for each “treatment” prisoner.

Nonetheless, there are major objections to PSM as a way of attributing any effects observed. One is that PSM can only ever look at indicators that are observable, such as age, background, and criminal history. Yet it is often unobservable factors – such as attitude or resilience – that drive behaviour.

Another problem is that the only data available for the PSM are what’s stored in the Police National Computer, which is surprisingly basic. For instance, it cannot distinguish whether somebody has mental health problems or a history of heroin use, which obviously would influence their behaviour and the care they need.

Astonishingly, even the Ministry of Justice explicitly acknowledges that the control group may be pointless (see page 7 of this Ministry of Justice document about the evaluation: https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/162352/peterborough-social-impact-bond-assessment.pdf.pdf).

The third respect is whether the bond structure itself works. Social Finance says that the mere existence of this first bond proves that it is possible. It has defined performance criteria against which a public body agreed to repay, and found private donors willing to provide funding based on those criteria.

But when we eventually see the re-offending rates of the treatment and control groups, we will not know whether to attribute any differences to:

  • Social Finance’s particular mix of interventions
  • The money. The SIB brings in about £1,667 pounds per prisoner. Bird thinks any prison governor could use that amount to dramatically reduce re-offending. It is possible that the prison governors could out-perform Social Finance’s programme.
  • The new financing mechanism itself. We will not know whether it produces better outcomes than if that money had been put into that intervention through, say, a grant programme.

The core problem might be that Social Finance is delivering on a contract: it is not doing social science research, to which distinguishing between possible causes is central. So does the difficulty of seeing the effect of the financing mechanism itself matter? Well, not for Social Finance or its donors in this first instance. Their proximate issue is delivering the contractual obligations such that they get paid. But surely it would have been helpful to Social Finance’s future work to see the effect of the SIB mechanism itself.

It certainly matters to the Ministry of Justice, which 1) may end up paying for a service that didn’t achieve anything beyond what that particular prison governor would have achieved without that money, and 2) will not therefore know what service they should roll out to other prisons if the Peterborough service does apparently succeed.

It matters even more to UK taxpayers who are funding all of this—as well as hoping not to be burgled or mugged. Yet they are unlikely to object because the intricacies of randomisation and PSM for determining attribution are a shade too complex.

“All these problems could have been averted,” says Bird. She says, for example, that this first SIB could have been tested against a known intervention with a conventional funding mechanism.

And yet, we should not let the best be the enemy of the good. Clearly, we are likely to get better public services when the interests of the provider and purchaser are better aligned, and SIBs are a step in the right direction. Despite the Peterborough SIB’s curious design choices, it has taught us many things– and will teach us many more.


This article is tagged under:

  • Social investment