Economists have long been challenged by the question: how does one decide if a particular social program is in the national interest (to use the Prime Minister’s favourite expression)?
We economists talk a great deal about cost-benefit evaluations but it is never clear what goes into these calculations. So here is my go at it.
For a social program to “pay off” and meet the national interest test, it needs to meet five conditions.
First, the mix of long term social goals that the program is seeking to achieve (e.g. greater equality, more income mobility, greater participation, self-help etc.) should be broadly in line with community values and priorities.
Secondly, there should be enough hard evidence to suggest that the program can be potentially effective in achieving its social goals i.e. that it can deliver the desired social outcomes in the longer term.
Thirdly, if the aims of the program are well defined up front, it should be possible for a good government to implement it efficiently and effectively (i.e. there are no horrendous administrative complexities or opportunities for surreptitious political abuse).
Fourthly, the cost of providing and delivering the program should not be such that it imposes an unacceptably high and sustained burden on taxpayers in the short/medium term (posing the risk of a strong electoral backlash).
Finally, after allowing for the positive economic effects of the program and the adverse secondary (third party) effects of higher taxation on the economy (even with the best non-distorting revenue instruments available), the net impact on national productivity and economic growth should be small or at least not too detrimental (in the most extreme case, they should not risk making everyone rich and poor alike worse off, thus defeating the whole purpose of the exercise).
The question I am raising for debate is a methodological one. I am not looking to get into an argument over the merits of this or that social program or even how one measures each of these five criteria.
I think you cannot avoid having as one of the criteria “that it is practical to identify and measure that the program is delivering the objects it was intended to” or possibly
‘that it is practical to identify and measure the unintended consequences of the program’
One avenue, a long-term longitudinal study of a sample of disadvantaged people and families to see how their position improves or at least changes over time and the way that specific initiative influence their position, especially their incentives to make an effort to help themselves by saving, training, practicing birth control, getting jobs etc.
Software developers use unit tests on their code which are performed every time the program is compiled. Unit tests are simple internal checks to ensure that the program is returning the results that each method or object is expected to perform.
This grew out of software having to clean up its poor quality control, so developers run repeatable tests constantly on their code in the development cycle before handing the application over to human level testing such as integration or acceptance testing.
The “pay off” in a social program is really a quality control issue.
Yes, I was going to mention Cam’s point. Ongoing measurement is very important and indeed if the program doesn’t provide reaasonable self measurement that’s a fairly strong blow against it – but we have a pretty poor idea about a lot of programs in that department. (Take most industry policy programs for instance . . .)
Very pertinent comments, thanks. I would now reword test 3 as follows:
“Thirdly, the aims of the program should be fully and clearly defined and made transparent up front, so as to allow the Government to monitor progress closely and to periodically review its effectiveness, as well as to minimise the risk of surreptitious political abuse in implementation”.
I think the “administrative complexity” bit has gone as it is already implicitly captured in test no. 4. It does not belong in 3.
Fred,
Have you read Charles Lindblom on this? His stuff on ‘disjointed incrementalism’ in the 1960s and 70s was good stuff. He makes the point (and this is directly antagonistic to a lot of what he calls ‘synoptic’ reason of economists) that policy often cannot be completely defined in advance, that it evolves, and we may not know what the best way to do something is, so we start out by trying and then modifying what we do.
Of course that’s just the other side of what I said above because it’s also useful to have clear statements of purposes at the outset and then measurement against them. But Lindblom gives the idea of a govt dental health program delivered (physically) through schools. As he says, if you had the schools official purpose being to provide education, then it’s a misuse of the schools to use them in a dental program. But in fact it may be the least cost and most effective way to deliver your dental program. That gives the flavour of his approach.
In the end we have to try to experiment as intelligently as we can. Then we have to have good information about what’s a success and what’s not. Then we need to preserve the good experiments and expunge the failures. Not always an easy thing.
In my Intro to Public Law class, I set this Evatt Foundation article on new managerialism by David Boyle as additional reading. It deals with the UK situatio under the Blair government, but contains some obervations that might be germane to some of the points made above:
“Over the past decade or so – boosted by added enthusiasm from new Labour – we have been plunged into what Professor Michael Power of the London School of Economics calls “the audit culture . . . a gigantic experiment in public management”. We can see the results everywhere. The government introduced about 8,000 targets or numerical indicators of success during its first term of office. We have NHS targets, school league tables, environmental indicators – 150 of them at last count – and measurements covering almost every area of professional life or government, all in the name of openness, accountability and democracy.
Nor is this just happening in the public services. The Japanese multinational Matsushita has developed a “smart” toilet that measures your weight, fat ratio, temperature, protein and glucose every time you give it something to work on. Then it sends these figures automatically to your doctor.
Accountancy firms cream off 10 per cent of British graduates to do all this counting. Whole armies of number-crunchers are out there, adding to the budgets of public transport, the NHS and social services.
We have been here before – especially in periods of great social hope such as the 1830s, when the followers of Jeremy Bentham rushed across the country in stagecoaches, armed with great bundles of tabular data and measuring everything they thought important: the number of cesspits (which they saw as an indicator of ill health), or pubs (an indicator of immorality), or the number of hymns that children could recite from memory.
Then as now, the problem is that what really needs measuring is not countable. “So-called efficiency,” says Richard Scase, professor of organisational behaviour at the University of Kent at Canterbury, “takes the place of effectiveness, quantity of quality. The means become an end in themselves.” As anyone in local government will tell you, these numerical indicators are about management at a distance, and they will always miss the point: school league tables make teachers concentrate on borderline pupils at the expense of their weaker classmates; waiting-list targets persuade NHS managers to treat those with the quick, simple problems at the expense of everyone else.
It is a dream from the world of management consultancy, encapsulated in the McKinsey slogan that “everything can be measured and what gets measured gets managed”. It is no accident that Nick Lovegrove, a partner at McKinsey & Co, is advising Gordon Brown on productivity and Tessa Jowell on IT strategy. Another McKinsey recruit has been appointed to advise No 10 on transport policy.
The problem is that people are now expected to do what the targets tell them, rather than what is actually necessary. Hospitals are ordering more expensive trolleys and reclassifying them as “mobile beds”, to sidestep the target that no patient should stay on a hospital trolley for more than four hours. I also know of at least one local authority that achieves government targets for separating waste – at great expense – but then simply mixes it all up again in landfill. Scotland Yard figures that showed it had recruited 218 people from ethnic minorities between April and September 2000 turned out to include Irish, New Zealanders and Australians. The useful figure was four.
The consequences of pinning down the wrong thing are severe. All your resources will be focused on achieving something you did not intend, as the Pentagon discovered in the Vietnam war, when it audited the success of military units by their body counts. Result: terrible loss of life among the Vietnamese, but no US victory.
The Blair government’s dilemma is that if ministers measure the things over which they have direct control, they simply measure the activity of bureaucrats. If they measure real effects – for instance, the looming and probably unreachable targets for school attainment in English, maths and truancy – they risk detonating a political time bomb when they fail to meet them. …
Accountability is important, and the auditing culture was in part a response to the crudity of measuring success by the financial bottom line. But measurement of this kind may be more about empire. It is about the idea that everything can be controlled from the centre, every job broken down into measurable parts – a Taylorist fantasy of time and motion – with every decision taken in full view of the auditors and the public.
It is hard to imagine a revolt spreading beyond French economics students unless the movement comes up with a coherent alternative, but also possible to glimpse what that might look like. It would be about decentralising power, giving more hands-on experience to teachers, managers and civil servants, and creating smaller, human-scale institutions. It would mean more face-to-face management, nurturing responsibility and creativity – in short, all the things that new Labour finds hardest.
A friend of mine with a hefty government grant, negotiating with civil servants over his annual targets, tells me he quoted the old Scottish proverb: “You don’t make sheep any fatter by weighing them.” They looked at him with complete incomprehension. There is clearly a long way to go.”
Yep,
That’s definitely the other side of the coin. Enora O’Neill (I think that was her name) gave the Reith Lectures a few years ago on the cult of transparency. Her basic point was a good one, though it wasn’t very balanced. That is she didn’t really show us how to do things better, but did at least deftly point out the absurdity to which the assumption that more transparency is always a good thing can lead.
It is possible that the basic problem is the welfare state itself and the many ways that welfare programs, as they have evolved, actually disempower people from making provisions for themselves.
The two classic examples are the counterproductive results of the US New Deal and the Great Society programs of the 1970s.
Of course some people need help from others because they are chronically sick or handicapped. That could be provided without need for a comprehensive welfare state.
I think the problem of accountability has two features (both of which have been previously mentioned).
The first is the problem of measuring the important quality aspects and the second;
ensuring that the measurement is responded to in an appropriate way.
Currently because neither the Minister nor the Department CEO want to be associated with the discovery that the policy needs modification; the two people who could make a difference have an incentive to form a cartel of non-discovery.
Thus we have an organisational flaw that impedes improvement in policy outcomes in the light of experience.
My own feeling is that the problems caused by this organisational fault are a far more important bottleneck to the improvement in policy outcomes than policy analysis and development.
Re Cam’s point about unit tests – People who evaluate social programs sometimes use a ‘program logic’ or ‘program theory’ approach to guide their research.
Impact evaluations are good at telling you that your program has failed (or succeeded) to achieve its objectives (eg placing unemployed people in jobs and keeping them there) but often don’t help you understand why the program has failed (or suceeded).
With the program logic approach you try to bring the program designer’s assumptions to the surface (eg unemployed people lack vocational skills) and then map out how the various parts of the program are supposed to fit together.
Then researchers are able to check the assumptions, see whether the parts of the program were implemented as intended, and measure the outputs.
Sometimes programs fail because they’re based on faulty assumptions (eg a 4 week program can achieve what 10 years of schooling failed to do). Sometimes they fail because they’re not implemented (eg nobody attends), and sometimes they fail because they’re implemented poorly (eg the training fails to impart skills because the trainer is underqualified or because the participants already have them).
Knowing why programs succeed or fail can help policy makers to decide how to impove them.
Welcome to a substantial portion of my working life. The hard bit, in my experience, is trying to find meaningful measures in between quantifiable output/activities (which, as shown above, often measure the wrong thing and can easily lead to worse outcomes as the service deliverers work around them) and the broad high-level outcomes “every child is healthy” (or similar) which are fairly easily expressed. The comment made earlier abou trying to clearly dscribe the desired outcome at a project level is one way – we are testing that with some stuff I am currently involved in. If it’s a dental program, f’rinstance, maybe along the lines “every child between 4 and 12 has fewer than 2 cavities, when examined once a year” (this is just off the top of the head, may contain traps).
Of course at some stage it does need to be quantifiable (Treasuries are more comfortable!) but it also has to make sense.
Ritual assertion in defence of public servants: all the ones I deal with on these sorts of issues would have understood the sheep reference.
We are getting an interesting debate. Ken, Phil and others have warned that in the implementation process, targeting and monitoring has the potential to become a Frankenstein monster. I agree it is a real risk.
I think we need to distinguish between “intermediate”
I spend a lot of time worrying about test #2. My experience from dealing with politicians and policymakers is that what they think is hard evidence of impact would be regarded by most economists as soggier than a jam sandwich in a thunderstorm. Randomised policy trials don’t work in every context, but it’s surprising that we eschew them in Australia. As a result, we have no hard evidence on (for example) whether the multi-billion dollar Stronger Families and Communities Program actually has an impact.
Andrew – As far as I can tell most economists aren’t too keen on randomised control trials either. Instead they seem to favour things like regression modelling that few lay people can make any sense of. The public’s lack of statistical sophistication allows all kinds of fudging and as a result the debate turns into my economist versus your economist.
It’s social scientists like psychologists who seem to be the most enthusiastic about running experiments.
I should point out that I work in the area of economic development rather than social programs. I prefer to be careful about my identity for the obvious reasons, such as: in my field, over the longer term, we get lots of correlation and little that we can ascribe to causation, given a federal system. Privately, I would not try to claim the sorts of things that ‘we’ ritually do, but ‘they’ pay my salary and so I try to come up with the best things I can (following ‘broad consultation’, of course).
That was the basis for my earlier suggestioon about trying to define intermediate outcomes that make sense: as has been pointed out, you can design the program and its
Don, I can see how it might look like economists aren’t very interested in randomised trials, but I think it might instead be that there just aren’t any randomised trials for us to write about. So instead, we spill ink on quasi-experiments. If you look to the US, where they do randomised trials, the randomised Moving to Opportunity trials have taught us a lot about neighbourhood effects, and led to several papers in top economics journals.
I’ve argued that Australia could answer the following questions via randomised trials:
– Does industry assistance create extra jobs? Does it boost R&D?
– Do job training programs boost employment and earnings?
– Do smaller classes boost the performance of Indigenous students?
– Would state provision of methadone increase or decrease addiction rates?
– What kind of prison programs lead to the highest post-release employment rates?
If a state or federal government in Australia were to run a randomised trial on one of these issues, I see no reason why an Australian economist couldn’t write it up for a top economics journal.
Andrew – Despite my grumbling about economists I agree with you. Randomised trials are a good idea.
I’ve always been impressed by the MDRC’s research on welfare to work programs.
Andrew, I have not personally carried out micro-evaluations for ages. I have tended to rely on time series evaluations of new policy initiatives and cross-country comparisons of the economic effects of alternative social models. So I am a bit out of touch.
I suppose by randomised trials and experiments you mean where some people are selected into a policy program (the treatment group) and others are not (the control group) and the relative outcomes are then compared? Isn’t that being used to some extent in departmental and commissioned evaluations of social programs? What are the alternative micro methodologies available?
Fred, that’s exactly what I mean. It has a clear analogy in the medical literature, and is known as the “gold standard” in policy evaluation.
To my knowledge, FACS did a very small randomised trial some years ago, and NSW evaluated the drug court this way (I have a description of it in my 2003 paper on randomised policy trials). Apart from that, there might be other randomised trials of social policy programs taking place in Australia at present, but if so, they’re happening very very quietly.