Evidence-based policy making – Part One: The problems

A stupid diagram – the kind of thing we can’t get enough of here at ClubTroppo. And remember “Reflect, revise and Improve”. That’s RRI – capiche?. In short, you can’t get enough RRI. In fact you should be doing it now! Reflect, Revise Improve!

As I’ve suggested before, a key to the development of the modern world was the evolution of new public goods (the other key was the development of new private goods. Capiche? The world is an ecology of both at every level of society and the economy – about which I’ll write more). The grand, progressive project of the nineteenth century saw public sanitation and health, public education, public security (the police) and merit-based public service all come of age. All made a huge contribution – economically, socially, politically.

One core of governance is the flow of information. That’s why weights and measures are a key part of governance – appearing in the role of the sovereign in Magna Carta and (I think) right back to the age of Hammurabi. Likewise the need for integrity. The office of the Auditor General goes back into the dim dark past too. As the UK National Audit Office informs us “The earliest surviving mention of a public official charged with auditing government expenditure is a reference to the Auditor of the Exchequer in 1314.”

However, like so many other things, the role of Auditor General came into its own in the nineteenth century with the emergence of the Exchequer and Audit Department in 1866. As Wikipedia tells us:

The existence and work of the NAO are underpinned by three fundamental principles of public audit:

  • Independence of auditors from the audited (executives and Parliament)
  • Auditing for: regularity, propriety and value for money
  • Public reporting that enables democratic and managerial accountability

But if the professionalisation of audit was brought on by the increasing complexity and professionalisation of government functions itself, surely what’s developed since requires us to go beyond the basic idea of audit for integrity – we want to do more than catch the misappropriation of money or even rank incompetence. As the three principles set out above make clear, the aspirations of the Auditor General function have risen well above this.

But though an external body might make a reasonable fist of auditing ‘regularity and propriety’, it can only provide a pretty rudimentary review of value for money. If one goes to the trouble of trying to find out, it turns out that, notwithstanding various short-lived crazes for ‘evidence-based policy’ (rather like the current craze-ette for government ‘agility’) very little government activity is guided by evidence. According to one study of US government program expenditure “less than $1 of every $100 the federal government spends is backed by even the most basic evidence”. I doubt things would be much different here.

As ever, policy development and even a lot of policy thinking hasn’t got far beyond these high-level slogans. Of course we want government activities to be evidence based. And yes, there really are lots of ways in which governments could take advantage of ‘big data’. But that’s the TED talk.

At least speaking from my own experience of delivering social policy via my involvement with The Australian Centre for Social Innovation (TACSI), as you get close to that subject of evidence rich management, let alone evidence rich policy making, it becomes progressively clearer that it’s no easy matter. It’s a whole agenda to itself – an unfolding of issues and strategies to try to deal with them. But working methodically down from objectives and taking account of issues on their merits all the way down to the coalface isn’t the way we do things. As I’ve documented elsewhere, the model is, rather that those at the top of the tree issue slogans and instructions and dust of their hands – another problem solved.

It’s kind of remarkable that, given economics pedigree as a discipline focused principally on questions of public policy, monitoring and evaluation of the efficacy of policy isn’t a compulsory part of an economics degree. (After all, ‘evidence’ is a part of a law degree – though of course it doesn’t involve much theoretical reflection on the nature of evidence but rather learning the legal rules – both wise and foolish – made by the profession around evidence).

In any event, any decent exposure to monitoring and evaluation exposes you to how imperfect monitoring and evaluation almost always is (and must be). If you’re trying to build social capital, or target government income support to those who need it most, how are you going to measure it? Apart from the intrinsic difficulty of discovering such things, there are more issues involved. Different parties will want different information. Those at the ‘coalface’ – if they’re any good – will want evidence that tells them how well they’re going, helps them diagnose how to improve and helps them measure those improvements. Those running the system may be interested in this information, but they’ll have their own objectives.

To take an example I’m familiar with, a major part of the Family by Family program is getting families to reflect and decide on some central objective they want to achieve during the program to improve their lives. That’s an important metric for us. Our ‘program logic’ tells us that that, and the self-efficacy that comes from success in that drives substantial improvements in all manner of things. So we track it. If Family by Family is being introduced to a new suburb, I’ll want to know how that metric tracks at the six week mark of a linkup compared with performance in other suburbs.

I wouldn’t expect the Department of Family and Community Services (FACS) who commission us to deliver Family by Family are directly interested in that metric. They’re interested in reducing the number of kids that end up on their notification registers and go through to out of home care. But both TACSI and FACS should be interested in building some kind of information bridge between what they’re interested in and what we manage for. In effect the information being generated by Family by Family should become their sensors at the coalface – or to change the metaphor, FACS’s fundamental data are the information arteries and the data of what’s happening at the coalface are the capillaries.

FACS should also be interested in the viability and desirability (or otherwise) of ‘Family by Family as a platform’ – which would involve considering what other services might be delivered by Family by Family. Might it be sensible to design, build and monitor modules in Family by Family in which the kind of family mentoring involved in the program was used to achieve educational, mental health, correctional services outcomes?  So here’s one major issue. The relation between the data collected by and available to the centre and the edge, the commissioners of programs (whether run in house or by third parties) and those at the coalface should have an organic relationship. Each should be interested in the data of the other and in their relationship. Right now that connection is broken. That’s because the commissioner becomes the boss. The commissioner sets out the data they require and the commissioned just does what it’s told – despite the opportunity costs.

Thus for instance in JobActive (which for those not paying attention is the new name for the Job Network), service providers may be required to determine how many job applications a job seeker has made. Yet those at the coalface may have good evidence that such reporting is counterproductive – and puts the whole interaction in a punitive frame which may vitiate the very objectives of the program. (Something similar happens when departments commission consulting work. Lateral Economics has a general policy of not bidding on open tenders. Not only is the process hopelessly capricious – for instance winning the tender will often involve correctly guessing the budget – because the project description could be delivered at any level of depth – and scaling the project to fit that guess. Also, although you’re being hired for your unique expertise the tender is written as if the commissioner of the work is the expert. As you read the tender – with its blizzard of instructions about how you must tender – you realise how they should have written the tender – but it’s too late.)

There’s another huge issue about the purpose of evidence and the knowledge of and incentives facing those collecting and evaluating the evidence. If an agency collects evidence to improve its own operations, that’s fine. But often the evidence is for the purposes of decision making about rather than for and within the program. It’s for a ‘higher’ part of the system – for an artery to decide whether or how to fund a capillary. This unleashes strong incentives.

It’s commonsense that, while the agency should be closely involved in this evaluation, it shouldn’t be controlled by it. Yet this is standard fare. Thus for about fifteen years state services have been ‘benchmarked’ by the PC produced “blue book” without the question of auditing the data or seeking to ensure it’s untainted by the bias of the collectors. Then there’s agencies performing their own evaluations, or choosing ‘independent’ parties to provide such evaluation of agency programs. Obviously a crock from the start. If you’re one of those contractors, you can be – you’re encouraged to be independent. But ‘within reason’. Guess how much repeat business you’ll get if you find that the agencies that commission evaluations aren’t delivering value for money? (Likewise in the role-play that is – ahem – Best Practice Regulation, agencies who see it as their job to ‘get up’ regulation for their ministers do the regulatory impact statements, the result being a travesty of the intention of transparency in the policy.)

Indeed, there are incentives at every level of the system for those generating information at a lower level to guild the lily to those above them. Indeed as I wrote in another context, government is performance. This generates incentives originally projected down from the political level but assiduously transmitted at each layer in the hierarchy producing a generalised preference of the system for generality over specificity, euphemism over candour. Everyone wins a prize. There are no unsuccessful projects. (And the inability to fire anyone for being useless creates some pretty strong incentives for avoiding information systems from which one could diagnose uselessness.)

Ideally most of these issues are finessed if the information is generated from the capillaries to optimise their objectives and is then aggregated up through to the arteries in such a way that it does not unleash perverse incentives. This is what Toyota seems to have done within its production system which was built around the idea of entrusting workers with managing the their own information. When introduced, it involved Toyota spending literally ten times as much on training its workers than American firms were doing. They were trained in statistical control, had job security which minimised perverse incentives and were given the task of using the information to endlessly optimise their productivity. That’s one model of how we should be trying to set up the information capillaries and arteries of our programs, but it’s a difficult business.

There are also a bunch of ‘cultural’ and resource issues. Lots of the the coalface workers delivering services feel uncomfortable in an evaluation culture – certainly one imposed from above. If they’re to come up with a regime that works well they may need additional resources from others more skilled in and comfortable with evaluation. An even more telling culture problem is in academia. It’s second nature to policy people that being ‘evidence based’ would involve a big, all singing all dancing evaluation – including if possible a randomised controlled trial. And you’d get academics to do that – with all their skills. Now of course this is a generalisation, but academics are part of another world. Their incentives are to display their (academic) expertise and if possible to get some publications our of their work. But the thing is, most data for decision making in programs is like data for decision making in business. There should be virtually no consideration of arbitrary standards of statistical significance – a major preoccupation of academic evaluation – and the timeframes will often need to be days – when academics’ time frames are months and years.

The standard of ‘evidence based’ decision making in a well run business is some considered (often just commonsensical) compromise between values such as timeliness, the probative force of evidence, cost, convenience and the absence of good reasons to the contrary. Thus various ‘nudge units’ have made A/B testing normal in government. But this is skill that requires a low degree of academic prowess, and should be done in real time. Ditto lots of decision making in a program like Family by Family.

So I think we need to work towards a quite new model of evidence based decision making in both policy making and service delivery. … To be continued …

This entry was posted in Economics and public policy, Innovation, Philosophy, Political theory. Bookmark the permalink.
Notify of

Newest Most Voted
Inline Feedbacks
View all comments
Peter McArdle
Peter McArdle
8 years ago

An excellent article. From my time in the public service and then on the other side of the table, I can cite numerous examples of the matters described.
Whenever I hear those words “Key Performance Indicators” aka KPI’s, I choke on my coffee. It means that the provider is not seeking to deliver a high quality service or product. Rather the sole objective of a (high performing) provider is to maximise the KPI.
Examples abound, but a typical example from some years ago was a program to teach English to new migrants. The Commonwealth agency contracted particular providers to provide a certain number of courses. There was no other measure of success. It was noted that one provider was providing only 75% of the required number of courses and they were threatened with consequences. So the provider doubled the number of courses provided by simply graduating all students at the half way point of the previous course and enrolling them in a new course the following day. The provider thus delivered 150% of the required number of courses and received a bonus and a contract extension.
The Commonwealth agency’s annual report noted the “remarkable” success of the program in exceeding benchmarks. The director of the program was promoted.

8 years ago
Reply to  Nicholas Gruen

There is KPI mania in university management (a good substitute for good management). I suspect this drives a lot of the bizarre stuff (like crazy KPIs given to academic staff), although like everything else there is also Stalinesc paranoia from management about letting people know what their KPIs are so you only ever find out via potentially incorrect gossip.