Robert Solow once referred to the law and economics scholar Richard Posner as writing books the way the rest of us breathe. Andrew Leigh seems to be in this category with his output apparently accelerating on top of his no doubt gruelling schedule as an MP, not to mention being a father of three.
Anyway, I’ve not yet read his latest but I did go to the Melbourne launch of his book where he lavished the breadth of his learning on his audience. I would have liked a somewhat greater awareness of the foibles of what Hayek called scientism in his speech.
Randomised controlled trials definitely have some very worthwhile things to offer policy making and Andrew’s speech makes that case compellingly. I also endorse his support for randomisation as a modus operandi – not just for all singing, all-dancing RCTs costing hundreds of thousands of dollars by academics, but also for every day randomisation in the way that’s proposed in the Lean Start-up and practised by the most successful IT firms like Google and Amazon.
But I’ve got an uneasy feeling about how randomisation so easily takes on the mantle of ‘gold standard’ for evidence – something repudiated by numerous scholars such as Angus Deaton and James Heckman. Here’s Hayek in 1942, but he held the same views up to his death around forty years later:
In the hundred and twenty years or so during which this ambition to imitate Science in its methods rather than its spirit has now dominated social studies, it has contributed scarcely anything to our understanding of social phenomena… Demands for further attempts in this direction are still presented to us as the latest revolutionary innovations which, if adopted, will secure rapid undreamed of progress.
This idea that we can prove up ‘what works’ and then build a management system around it is OK as a meta-idea but only if it’s pursued with the scientific caveats that it requires. Alas managers and politicians are impatient with such things. I fear Andrew might be a little impatient with it also. And so, just as academia pumps out graduates who have been carefully trained to generate and operate any number of sophisticated models but have been poorly trained, if they’ve been trained at all, to understand their respective merits and limitations, so it would be easy for whole systems to be built which generate knowledge using randomised trials, but show little care in understanding precisely how far that knowledge can be generalised – how constrained to its context it is. I tried to explore this terrain in my own dinner address to the Australian Evaluation Society Annual Conference last year.
In any event, these issues may be dealt with in the book. Be that as it may, Andrew gave a great account of himself and I warmly recommend his speech, reproduced below the fold, to all. You’ll learn a lot. I did anyway.
Andrew Leigh’s Launch Speech for his book Randomistas: How Radical Researchers Changed Our World
In 2013, a group of Finnish doctors published the results of a randomised trial of knee surgery performed for a torn meniscus, the piece of cartilage that provides a cushion between the thighbone and shinbone. This operation, known as a meniscectomy, is performed millions of times a year, making it the most common orthopaedic procedure in countries such as Australia and the United States.
The randomised trial was based on ‘sham surgery’, in which patients consent to being assigned either to a regular treatment, or to being cut open and sewn up again without the operation being performed. Not only is the patient assigned to true surgery or placebo surgery based on the toss of a coin – they are not even told afterwards what happened to them.
The 2013 randomised experiment showed that among middle-aged patients, surgery for a torn meniscus was no more effective than sham surgery. Not everyone welcomed the finding. An editorial in the journal Arthroscopy thundered that sham surgery randomised trials were ‘ludicrous’. The editors went so far as to argue that because no ‘right-minded patients’ would participate in sham surgeries, the results would ‘not be generalizable to mentally healthy patients’.
Yet sham surgeries are growing in importance, as people realise that the placebo effect in surgery is probably bigger than in any other area of medicine. A recent study found that three-quarters of patients say they feel better after surgery; but that in half the cases, those who got sham surgery experience just as big an improvement as those who got real surgery. The results suggest that millions of people every year are undergoing surgeries that make them feel a bit better – yet they would feel just as good if they had undergone placebo surgery instead.
Despite the advocacy of surgeons such as Melbourne’s Peter Choong, sham surgery remains in its infancy. Part of the challenge comes down to how they approach their job. Sydney surgeon Ian Harris points out that patients sometimes regard aggressive surgeons as heroic and conservative surgeons as cowardly.
* * *
What does a typical randomised trial look like? Suppose that we decided to test the impact of sleep on happiness by doing an experiment with the 100 people in this room. If we tossed coins, we would end up with 50 people in the heads group, and 50 in the tails group. Now imagine we asked the heads group to get an extra hour’s sleep that evening, and then surveyed people the next night, asking them to rate how happy they were with their lives. If we found that the heads group were happier than the tails group, it would be reasonable to conclude that a little more snooze helps lose the blues.
The beauty of a randomised trial is that it gets around problems that might plague an observational analysis, such as the possibility that happiness causes sleep – good-tempered people tend to hit the pillow early.
Randomised trials have a long history in medicine, going back to James Lind’s work on scurvy, and Ambroise Paré’s work on treating battlefield burns. In the 1800s, a randomised trial showed that bloodletting didn’t cure patients. Alas, the result came to be accepted after doctors had decided to call one of their leading journals The Lancet.
In the 1940s, British research Austin Bradford Hill was working on streptomycin, a promising new treatment for tuberculosis. The disease had nearly killed Hill as a child, and still claimed the lives of nearly 200,000 Britons annually. Hill used scarcity as an argument for doing a randomised trial, rather than rolling out the treatment across the country. ‘We had no dollars and the amount we were allowed by the Treasury was enough only for, so to speak, a handful of patients. In that situation I said it would be unethical not to make a randomised controlled trial’
A trial in 1954 randomly injected 600,000 US children with either polio vaccine or salt water. The vaccine proved effective, and immunisation of all American children began the following year. The 1960s saw randomised trials used to test drugs for diabetes and blood pressure, and the contraceptive pill.
In between, there have been plenty of randomised trials of ineffective treatments. Today, only one in ten drugs that look promising in the ends up finding its way onto the market.
In each case, those taking the new drug are compared against people taking a fake drug, or placebo. For alleviating discomfort, the placebo effect works in surprising ways. For example, placebo injections produce a larger effect than placebo pills. Even the colour of a tablet changes the way in which patients perceive its effect. Thanks to randomised trials, we know that if you want to reduce depression, you should give the patient a yellow tablet. For reducing pain, use a white pill. For lowering anxiety, offer a green one. Sedatives work best when delivered in blue pills, while stimulants are most effective as red pills. The makers of the movie The Matrix clearly knew this when they devised a moment for the hero to choose between a blue pill and a red pill.
For my own part, randomised trials have helped shape how I look after my health. I used to take a daily multivitamin tablet, until I read a study that found that for otherwise healthy people, there is no evidence that extra vitamins make you live longer. Nor do the randomised trials support fish oil supplements. I wear compression socks after an Australian randomised trial of marathoners showed that they aid recovery, and I remove my sons’ bandaids quickly rather than slowly after a study at James Cook University reported that it was less painful.
The randomistas are reshaping social policy too.
In Melbourne, the ‘Journey to Social Inclusion’ experiment was Australia’s first randomised trial of a homelessness program. The intervention lasted for three years, and provided the 40 people in the treatment group with intensive support from a social worker. This caseworker might help them find housing, reconnect with family and access job training. Another forty people in the control group did not receive any extra support.
What might we expect from the program? If you’re like me, you’d have hoped that three years of intensive support would see all participants healthy, clean and employed. But by and large, that’s not what the program found. Those who were randomly selected into the program were indeed more likely to have housing, and less likely to be in physical pain. But Journey to Social Inclusion had no impact on reducing drug use or improving mental health. At the end of three years, just two people in the treatment group had a job – the same number as in the control group.
The Journey to Social Inclusion program is a reminder of how hard it is to turn around the living standards of the most disadvantaged. Hollywood loves to depict overnight transformations, but the more common trajectory for someone recovering from deep trauma looks more like two steps forward and one step back.
Unless we properly evaluate programs designed to help the long-term homeless, there’s a risk that people of goodwill – social workers, public servants and philanthropists – will fall into the trap of thinking it’s easy to change lives. There are plenty of evaluations of Australian homelessness programs that have produced better results than this one. But because none of those evaluations was as rigorously conducted as this one, there’s a good chance they’re overstating their achievements.
Researchers in Canberra have run world-leading randomised trials of ‘restorative justice conferencing’ – bringing offender and victim together to discuss what the perpetrator should do to repair the harm. Cases judged suitable for restorative justice are randomly allocated to it or to the traditional process. The studies in Australia and around the world conclude not only restorative justice reduces crime, but also that it helps victims. In one study, victims of violence were asked if they would harm the offender if they got the chance. When cases went to court, nearly half the victims said afterwards that they still wanted to take revenge – compared with less than one in ten cases that went through restorative justice.
If only we had randomised evidence on the impact of prisons. Then again, it’s hard to imagine that any prison authority would agree to run an experiment to answer this question. Courts and parole boards aim to dispense equal justice, not rely on luck. To have enough statistical power would require thousands of prisoners. There would need to be big differences in the sentences of the two groups, based on nothing more than chance. The cries of unfairness would be deafening…
Or so you might think. In 1970 the California parole board agreed to run just such an experiment. That year, 3000 prisoners who were coming up for release were divided into two groups. Using a random table of numbers, half of the prisoners had their sentence shortened by six months, while the rest served their regular term. After release, the authorities looked to see who reoffended. They found no difference between the two groups, suggesting that another six months behind bars didn’t reduce make the streets any safer.
In the classroom, we’re learning a lot from randomised trials.
In one experiment, the Bill & Melinda Gates Foundation conducted a randomised trial of coaching programs for teachers. Each month, teachers sent videos of their lessons to an expert coach, who worked with them to eliminate bad habits and try new techniques. By the end of the year, teachers in the coaching program had seen gains in their classroom equivalent to several additional months of learning.
Another study looked at the Promise Academy, a school in Harlem that operates on a ‘no excuses’ model, with classes sometimes running from 8am to 7pm. Across the United States, the average black high school student is two to four years behind his or her white counterparts. Students who won a lottery to attend the Promise Academy improved their performance by enough to close the black–white test score gap. As lead researcher Roland Fryer points out, this overturns the fatalistic view that poverty is entrenched, and schools are incapable of making a transformational difference. He claims that the achievements of the Promise Academy are ‘the equivalent of curing cancer for these kids’.
Developing countries are awash with randomised trials. In Indonesia, a randomised trial tested the impact on students of randomly doubling teachers’ pay. In India, a randomised trial of 19 million people estimated the impact on corruption of rollout of biometrically identified smartcards.
When the Mexican city of Acayucan found that council only had money to pave about half the streets, the mayor saw an opportunity to avert some voter anger, and learn about the impacts of road paving. Rather than selecting the roads herself, she let researchers randomly choose which streets to upgrade. In Kenya, economists worked with the national electricity utility to randomly give some households a discount on their connection fee. By varying the subsidy, the researchers were able to see how much households valued being connected to the grid.
Businesses are working on randomised trials too.
Quora, a question-and-answer website, devotes a tenth of its staff to running randomised trials, and is conducting about thirty experiments at any given time. Amazon is virtually built on randomised trials. As one commentator observes ‘every pixel on the 1 home page has had to justify its existence through repeated testing of alternative layouts’. In retail, if you’re wondering why half of all prices end in nine, you can blame the use of randomised marketing trials.
If you have a Coles FlyBuys card, you’re part of a randomised trial. One in 100 cards is randomly selected to be a control group, which does not receive any promotional material. This lets the company benchmark the impact of its promotions.
The shade of blue on the Google toolbar is the result of a randomised trial run by Marissa Mayer, then a vice-president at Google. She proposed an experiment that tested 40 different shades of blue. With billions of clicks, even a small difference means big bucks. One estimate if that finding the perfect colour for the toolbar added US$200 million to Google’s bottom line.
Google’s scientists have access to around 15 exabytes of data, and around 40,000 searches each second. This suggests that big data isn’t an alternative to randomised trials. If Google still gets value from randomised experiments, then the same must go for every other researcher on the planet.
Running a randomised experiment in business is often called ‘A/B testing’, and has become integral to the operation of firms such as Netflix, eBay, Intuit, Humana, Chrysler, United Airlines, Lyft and Uber. One US executive says that his firm has three cardinal rules: ‘you don’t harass women, you don’t steal and you’ve got to have a control group’. Yes, that’s right – you can lose your job for not having a control group.
* * *
You can even use randomised trials in your own life. Last year, I used Google ads to run a small experiment of my own. Anyone who searched the web might have seen an ad for a new book about randomised trials. Web surfers were randomly shown one of twelve possible book titles. My editors and I each had our favourite titles, but we had agreed to leave the final decision to a randomised experiment.
A week later, over 4000 people had seen one of the advertisements. The worst performing title (not a single person clicked on it) was Randomistas: How a Powerful Tool Changed Our World. Second place was Randomistas: The Secret Power of Experiments. And the clear winner was Randomistas: How Radical Researchers Changed Our World. The experiment took about an hour to set up, and cost me about $50.
A few years earlier, I had written a book on inequality for the same publisher. My editor wanted to call it Fair Enough? My mother suggested Battlers and Billionaires. After running Google ads for a few days, we found that the click rate for my mother’s title was nearly three times higher. My editor graciously conceded that the evidence was in, and Battlers and Billionaires hit the shelves the following year.
* * *
In the early-2000s, successful businessman Blake Mycoskie visited villages outside Buenos Aires, and was struck by what he saw: ‘I knew somewhere in the back of my mind that poor children around the world often went barefoot, but now, for the first time, I saw the real effects of being shoeless: the blisters, the sores, the infections.’
To provide shoes to those children, Mycoskie founded ‘Shoes for Better Tomorrows’, which was soon shortened to TOMS. The company made its customers a one-for-one promise: buy a pair of shoes and TOMS will donate a pair to a needy child. TOMS has given away over 60 million pairs of shoes.
Six years in, Mycoskie and his team wanted to know what impact TOMS was having, so they made the brave decision to let economists randomise shoe distribution across eighteen communities in El Salvador. The study showed that the canvas loafers didn’t go to waste: most children wore their new shoes most of the time. But the children’s health wasn’t any better, as the TOMS shoes were generally replacing older footwear. Free shoes didn’t improve children’s self-esteem, but did make them feel more dependent on outsiders.
Let’s be clear about what this meant. Corporate philanthropy wasn’t an add-on for TOMS – it was the firm’s founding credo. Now a randomised trial showed that among recipients in El Salvador, free shoes weren’t doing much to improve child outcomes, and may even have been fostering a sense of dependency. Yet rather than trying to discredit the evaluation, TOMS responded promptly.
As lead researcher Bruce Wydick wrote: ‘TOMS is perhaps the most nimble organization any of us has ever worked with, an organization that truly cares about what it is doing, seeks evidence-based results on its program, and is committed to re-orienting the nature of its intervention in order to maximize results. In response to children saying that the canvas loafer isn’t their first choice, they now often give away sports shoes . . . In response to the dependency issue, they now want to pursue giving the shoes to kids as rewards for school attendance and performance . . . Never once as researchers did we feel pressure to hide results that could shed an unfavourable light on the company… we applaud them for their transparency and commitment to evidence-based action among the poor.’
No-one should fault Blake Mycoskie for setting up TOMS shoes, acting based on the best available evidence at the time. As the poet W.H. Auden once put it, ‘We may not know very much, but we do know something, and while we must always be prepared to change our minds, we must act as best we can in the light of what we do know.’
But when new facts arrive, TOMS shifted. And because of that, the TOMS randomised trial doesn’t look like a failure at all. Blake Mycoskie’s goal in establishing the firm was to improve the health of poor children. The company evaluated its approach. It didn’t work. So it changed tack. The philosophy of test-learn-adapt is at the heart of randomisation.
Randomised trials flourish where modesty meets numeracy. An experimenting society doesn’t just mean we do more rigorous evaluation, it also means we pay more attention to the facts. We are less dogmatic, more honest, more open to criticism, less defensive. We are more willing to change our theories when the data prove them wrong.
Ethically done, randomised experiments can change our world for the better. Time to toss a few more coins?