In medicine, bad evaluation comes with a body count. Until it was shown up in a randomised trial, hydroxychloroquine as a treatment for COVID cost more than 10,000 lives. Before they were proven to be dangerous in a randomised trial, antiarrhythmic drugs for errant heartbeats cost more than 50,000 lives. In the century before they were put to a randomised trial, radical mastectomies for breast cancer disfigured 500,000 women while producing no better outcomes than targeted surgery.
Expert opinion and low‑quality before‑after studies supported these harmful treatments. If randomised trials had not put a stop to them, the toll might have been greater still. Yet while randomised trials are common in medicine, rigorous evaluation remains rare in policymaking. A study from the Committee for Economic Development of Australia think tank examined a sample of 20 Australian Government programs conducted between 2015 and 2022, with a total expenditure of more than $200 billion.
CEDA found 95 per cent were not properly evaluated. Its analysis of state and territory government evaluations reported similar results. Across the board, CEDA estimates fewer than 1.5 per cent of government evaluations use a randomised design.
This finding echoes the Productivity Commission’s 2020 report into the evaluation of Indigenous programs, which concluded that ‘both the quality and usefulness of evaluations of policies and programs affecting Aboriginal and Torres Strait Islander people are lacking’, and ‘evaluation is often an afterthought rather than built into policy design’.
That’s where the Australian Centre for Evaluation comes in. Established a year ago, the centre aims to expand the quality and quantity of evaluation across the public service. Collaborating with departments, the centre is busily initiating rigorous evaluations of programs across a range of agencies. While the Australian Centre for Evaluation works across government, the Paul Ramsay Foundation has recently launched a $2 million grant round to support experimental evaluations conducted by non‑profits with a social impact mission. The Paul Ramsay Foundation gives a few examples, including programs aimed at improving education outcomes for young people with disabilities, reducing domestic and family violence, or helping jobless people find work. This announcement demonstrates the commitment to rigorous evaluation by Australia’s largest philanthropic foundation.
The few randomised policy trials conducted so far have produced some unexpected findings. Some had argued that parents would be more likely to get their children to school when subject to the threat of losing welfare payments. In 2016, researchers, including Harvard’s Michael Hiscox, carried out a randomised trial in the Northern Territory of the Improving School Enrolment and Attendance through Welfare Reform Measure program. About 400 children were assigned to the treatment group, and the same number to the control group. Despite threats of payment suspension – and actual payment suspensions for some families – the program had no effect on attendance.
Another randomised trial, conducted by the University of NSW’s Richard Holden and his collaborators, looked at the examples used on school literacy tests. A randomised trial across 1135 students in Dubbo compared how students perform when tests are made culturally relevant – for example, by replacing a passage about lighthouses with a passage about the big dish in nearby Parkes. The improvement is sizeable, amounting to half the gap between Indigenous and non‑Indigenous students.
Another surprise came from a randomised trial of Health4Life, a program co‑designed with students and educators with the aim of reducing risky behaviour by teens. A randomised trial led by Sydney University’s Katrina Champion across 85 schools in Brisbane, Perth and Sydney found that Health4Life had no measurable impacts on alcohol use, tobacco smoking, recreational screen time, physical inactivity, poor diet or poor sleep.
What is distinctive about these studies is that they are not simply pilot programs or observational studies, but randomised trials, where participants are assigned to the treatment and control groups according by the toss of a coin. Randomisation ensures that before the experiment starts, the 2 groups are as similar as possible. This means any difference we observe between them must be due to the intervention. In formal terms, randomisation provides an ideal counterfactual.
Without randomisation, what might go wrong? One danger is that the comparison group might be different from the outset. Students and schools who sign up for a new program could be quite different from those who say no. This is why medical researchers tend to trust randomised trials not only for assessing the impact of new drugs, but also increasingly for testing the impact of diet, exercise, surgery and more.
A ‘what works?’ approach to government promotes democratic accountability. It makes government more effective and efficient – producing better public services for every dollar we raise in taxes. But it especially matters for the most vulnerable. When government doesn’t work, the richest can turn to private options. But the poorest have nowhere else to turn, since they rely on well‑functioning government services for healthcare, education, public safety and social services. Getting impact evaluation right can deliver not just a more productive government, but a more egalitarian society.