As readers of this blog will know I regard the state of the economics profession as a scandal, and have for years. It’s only occasionally when it really matters, as no matter how good the discipline was it is mostly condemned to ignorance – the world is too complex to understand. But as people like Delong and Krugman have been pointing out simply bizarre arguments – like that US unemployment is mainly structural – (it structurally went from ~5 to ~ 10% following a financial crisis and it’s structural! Sure it is. And the Great Depression was a spontaneous holiday – sadly that’s not a joke. Economics is a profession where people much cleverer than me think things that are absurd.) This is just one example, pretty much every odd week sane economists fight off idiotic propositions championed by highly credentialled economists. Like the ‘bond vigilantes’ who were lurking ready to strike at any moment during a prolonged and deep recession. Like that low interest rates present a danger of deflation not presented by higher interest rates – yes folks this view was put by Minnesota Fed President Narayana Kocherlakota.
Anyway, I knew most of the pathology documented in this critique of medical research and publication, but it’s somehow shocking nevertheless. Of course it’s not really news to anyone but the most naive that the drug industry corrupts both the conduct and the reporting of drug research. But it’s amazing how much of the problem is driven by something much more mundane. Publication bias – the fact that publications publish results, not non-results and science is mostly made up of non-results – tests for possible correlations that turned out not to be there. Publication bias is a bad bad thing, but then of course it gets made much worse because academics need publications and go hunting them. Anyway the article is about Professor John Ioannidis – which in Greek I would have thought would be something like Ioanis Ioannidis:
In poring over medical journals, he was struck by how many findings of all types were refuted by later findings. Of course, medical-science “never minds” are hardly secret. And they sometimes make headlines, as when in recent years large studies or growing consensuses of researchers concluded that mammograms, colonoscopies, and PSA tests are far less useful cancer-detection tools than we had been told; or when widely prescribed antidepressants such as Prozac, Zoloft, and Paxil were revealed to be no more effective than a placebo for most cases of depression; or when we learned that staying out of the sun entirely can actually increase cancer risks; or when we were told that the advice to drink lots of water during intense exercise was potentially fatal; or when, last April, we were informed that taking fish oil, exercising, and doing puzzles doesn’t really help fend off Alzheimer’s disease, as long claimed. Peer-reviewed studies have come to opposite conclusions on whether using cell phones can cause brain cancer, whether sleeping more than eight hours a night is healthful or dangerous, whether taking aspirin every day is more likely to save your life or cut it short, and whether routine angioplasty works better than pills to unclog heart arteries.
But beyond the headlines, Ioannidis was shocked at the range and reach of the reversals he was seeing in everyday medical research. “Randomized controlled trials,” which compare how one group responds to a treatment against how an identical group fares without the treatment, had long been considered nearly unshakable evidence, but they, too, ended up being wrong some of the time. . . . This array suggested a bigger, underlying dysfunction, and Ioannidis thought he knew what it was. “The studies were biased,” he says. “Sometimes they were overtly biased. Sometimes it was difficult to see the bias, but it was there.” . . . Perhaps only a minority of researchers were succumbing to this bias, but their distorted findings were having an outsize effect on published research. To get funding and tenured positions, and often merely to stay afloat, researchers have to get their work published in well-regarded journals, where rejection rates can climb above 90 percent. . . .
In the late 1990s, Ioannidis set up a base at the University of Ioannina. He pulled together his team, which remains largely intact today, and started chipping away at the problem in a series of papers that pointed out specific ways certain studies were getting misleading results. Other meta-researchers were also starting to spotlight disturbingly high rates of error in the medical literature. But Ioannidis wanted to get the big picture across, and to do so with solid data, clear reasoning, and good statistical analysis. . . . In 2005, he unleashed two papers that challenged the foundations of medical research.
He chose to publish one paper, fittingly, in the online journal PLoS Medicine, which is committed to running any methodologically sound article without regard to how “interesting” the results may be. In the paper, Ioannidis laid out a detailed mathematical proof that, assuming modest levels of researcher bias, typically imperfect research techniques, and the well-known tendency to focus on exciting rather than highly plausible theories, researchers will come up with wrong findings most of the time. Simply put, if you’re attracted to ideas that have a good chance of being wrong, and if you’re motivated to prove them right, and if you have a little wiggle room in how you assemble the evidence, you’ll probably succeed in proving wrong theories right. . . . “You can question some of the details of John’s calculations, but it’s hard to argue that the essential ideas aren’t absolutely correct,” says Doug Altman, an Oxford University researcher who directs the Centre for Statistics in Medicine.
Still, Ioannidis anticipated that the community might shrug off his findings: sure, a lot of dubious research makes it into journals, but we researchers and physicians know to ignore it and focus on the good stuff, so what’s the big deal? The other paper headed off that claim. He zoomed in on 49 of the most highly regarded research findings in medicine over the previous 13 years, as judged by the science community’s two standard measures: the papers had appeared in the journals most widely cited in research articles, and the 49 articles themselves were the most widely cited articles in these journals. These were articles that helped lead to the widespread popularity of treatments such as the use of hormone-replacement therapy for menopausal women, vitamin E to reduce the risk of heart disease, coronary stents to ward off heart attacks, and daily low-dose aspirin to control blood pressure and prevent heart attacks and strokes. Ioannidis was putting his contentions to the test not against run-of-the-mill research, or even merely well-accepted research, but against the absolute tip of the research pyramid. Of the 49 articles, 45 claimed to have uncovered effective interventions. Thirty-four of these claims had been retested, and 14 of these, or 41 percent, had been convincingly shown to be wrong or significantly exaggerated. If between a third and a half of the most acclaimed research in medicine was proving untrustworthy, the scope and impact of the problem were undeniable. That article was published in the Journal of the American Medical Association. . . .
Ioannidis points out that obviously questionable findings cram the pages of top medical journals, not to mention the morning headlines. Consider, he says, the endless stream of results from nutritional studies in which researchers follow thousands of people for some number of years, tracking what they eat and what supplements they take, and how their health changes over the course of the study. “Then the researchers start asking, ‘What did vitamin E do? What did vitamin C or D or A do? What changed with calorie intake, or protein or fat intake? What happened to cholesterol levels? Who got what type of cancer?’” he says. “They run everything through the mill, one at a time, and they start finding associations, and eventually conclude that vitamin X lowers the risk of cancer Y, or this food helps with the risk of that disease.” In a single week this fall, Google’s news page offered these headlines: “More Omega-3 Fats Didn’t Aid Heart Patients”; “Fruits, Vegetables Cut Cancer Risk for Smokers”; “Soy May Ease Sleep Problems in Older Women”; and dozens of similar stories. . . .
Most journal editors don’t even claim to protect against the problems that plague these studies. University and government research overseers rarely step in to directly enforce research quality, and when they do, the science community goes ballistic over the outside interference. The ultimate protection against research error and bias is supposed to come from the way scientists constantly retest each other’s results—except they don’t. Only the most prominent findings are likely to be put to the test, because there’s likely to be publication payoff in firming up the proof, or contradicting it.
But even for medicine’s most influential studies, the evidence sometimes remains surprisingly narrow. Of those 45 super-cited studies that Ioannidis focused on, 11 had never been retested. Perhaps worse, Ioannidis found that even when a research error is outed, it typically persists for years or even decades. He looked at three prominent health studies from the 1980s and 1990s that were each later soundly refuted, and discovered that researchers continued to cite the original results as correct more often than as flawed—in one case for at least 12 years after the results were discredited.
An extremely weird part of the story is where the Prof is asked a question. “If I did a study and the results showed that in fact there wasn’t really much bias in research, would I be willing to publish it?” he asks. “That would create a real psychological conflict for me.” So he’s just as biased as all the scientists he blows the whistle on it seems.
All of which leads me to my conclusion which is that it really is a disgrace that there isn’t some more concerted effort to try to develop protocols to avoid some of the worst of this. Deirdre McCloskey published work many years ago showing how economists’ articles almost invariably reported the statistical significance of their findings without reporting the economic significance (the former is a matter of judgement whilst economic is always what really matters). This created quite a stir, and she did the same survey a decade later. I expected it would have got quite a lot better. It hadn’t – from memory it had got worse, but it may have got (marginally) better.
It is harder in economics because it’s so ideologically loaded, technically dense and ultimately difficult to ever really prove anything of real significance. So even if McCloskey’s critique had been taken to heart – as it should have been – it wouldn’t make a huge difference to our economic understanding, which will never be able to get too far from the informed largely atheoretical commonsense of the chief economists of banks flavoured by some really basic big theories of Smith, Ricardo, Keynes etc.
But medical science? Well it really can aspire to be a lot more useful than that. It’s a biological science and we know a lot about the basic problems. So would it be so hard for those in positions of power in these disciplines and the gatekeepers of the top journals to try to develop protocols to reduce the effects of these problems. I would have thought it is the kind of thing that could be influenced by the leadership of a few acknowledged leaders in their fields – Nobel Prize winners getting together and making a fuss. And if that doesn’t work, whilst I can accept that governments should not be interfering in professional ethics in any direct way regarding individual research, could they not help such a group to form and start promoting their message. We have vast bureaucratic effort from both universities and governments dedicated to measuring scientists’ position in the pecking order research quality frameworks and the variants of the same kind of thing elsewhere, something that may or may not make sense in ranking scientists but certainly makes many of these kinds of problems worse. Might we not aspire to a quality framework which might actively try to work against these well known pathologies of science as it’s practised today and do a better job of generating good science? It’s hard to imagine a more important public good.
“So would it be so hard for those in positions of power in these disciplines and the gatekeepers of the top journals to try to develop protocols to reduce the effects of these problems. ”
Yes, because the top journals are often based on publishing new and interesting finds (that’s why they’re top journals), and so there is no real room for replication etc. . That would be true of some areas covered by the top two journals in the world (Nature and Science), where they want catchy papers that generate publicity. This means that saying “we didn’t replicate X,Y, and Z” can’t get published unless it’s a fraud claim.
“We have vast bureaucratic effort from both universities and governments dedicated to measuring scientists’ position in the pecking order .. Might we not aspire to a quality framework which might actively try to work against these well known pathologies of science as it’s practised today and do a better job of generating good science?”
We could aspire to it, but it seems awfully unlikely. The most important thing in universities and many research centres today is money, and that’s quite clearly in conflict with a lot of this medical research (i.e., no-one will give you millions to show that their amazing cures don’t work). It’s also the case that almost all of the evaluation measures are thought up by micro-managers that couldn’t care less about these sorts of problems, since their main goal is to maximize a number, not query the validity of the number, and again, these two ideas are in conflict with each other.
On this note, there was a funny letter to Nature some months ago where a top scientist who I think was from Portugal was complaining about this, and, as he pointed out, his university would be happy for him to publish this letter, because it means they could say they got an article in Nature, even though the letter itself was a complaint about such measurements.
Tell me again… what is the purpose of peer review?
Thanks for your comment Conrad, but I think we need to be a bit crafty about this. As far as trying to do this from within the discipline(s) is concerned, I think there are plenty of fairly incremental things that could be done. Couldn’t the top journals keep some kind of official record of developments regarding specific papers, with fields like “replicated/not replicated”, “strongly/weakly/not confirmed in replicated studies”. Could there not be places built on the net for research that does not produce a result. I guess these are some of the things that the open science movement is working on.
On your comment that everyone’s incentives are the same ones, I don’t think that’s right. A small but excellent institution has an interest in advertising its own excellence by bucking this system. I mean obviously the current system is a well entrenched equilibrium of forces, but that’s true of pretty much anything one might want to reform. And we’re reformed much harder things that this.
There are international governmental and University collaborations on global public goods. And this is the big daddy of global public goods – though not identified in the link I’ve just provided. Now in fact the idea of the UN coming to the rescue doesn’t exactly fill me with hope, but if the right kinds of leaders were to get behind this, I would have thought there could be some substantial change.
“Couldn’t the top journals keep some kind of official record of developments regarding specific papers, with fields like “replicated/not replicated”, “strongly/weakly/not confirmed in replicated studies” ”
I’m sure that’s possible, but no-one gets grants to run replications of things, and replications of things arn’t considered very highly (almost all journals specify work must be novel), so most people don’t think it is worth doing given limited time. Indeed, you’d be surprised at how uncommon straight-out replications are, even in areas where it’s exceptionally easy to do (let alone areas where it’s not).
Even excluding those problems, there is a massive bias against publishing failures to replicate, and many areas don’t consider failures to replicate as evidence against the initial study, since the differences may come from other factors (e.g., running a shoddy experiment). Indeed, in some areas, the only way you can get failures to replicate published is to do them as part of something else and then “accidentally” notice in another paper that you produced a different result, or, alternatively, use much more technical experimental design, which is not possible in some areas and very people do it anyway even in areas where you can (I doubt many people know how it can be done, including reviewers who then won’t understand your article, and dirty reviewers who don’t want you to publish something that shows they were wrong, who will get you rejected for that to).
“Could there not be places built on the net for research that does not produce a result.”
I’ve heard many people make this comment — it would be very simple to set up. But the problems are essentially the same as those I mentioned above (i.e., you spend your time doing it for little reward given the current system — although it wouldn’t be as difficult as actually failing to replicate someone else).
“On your comment that everyone’s incentives are the same ones, I don’t think that’s right. A small but excellent institution has an interest in advertising its own excellence by bucking this system.”
That’s just an observation of mine based on what happens in reality — I’m not sure of any university that really does this in Aus. To be more precise about what I’m saying than before, I think it’s something that is really better to think about on a graded scale, but even being not especially far down the scale will lead to the problems you are talking about. In addition, even if you could create places like you are thinking about (perhaps some of the top US universities already qualify), I also don’t think it would make much difference, since all the other places will still produce enough stuff by themselves to continually perpetuate the problem. If you’re talking about Australian universities, for example, I don’t think there’s any that wouldn’t promote you to full professor if you got, say, $2 million from _any_ source, no matter how dodgy (e.g., the Adolph Eichmann foundation to study individual differences).
“There are international governmental and University collaborations on global public goods. And this is the big daddy of global public goods”
I work with someone that has done really great stuff on that and public attitudes to the type of thing you are talking about. Alternatively, I also work with people that produce the type of thing you are complaining about. I bet you can’t guess who gets paid more/has more power. What this highlights is that it doesn’t really matter that how well we understand what the problem is (I think many of us already do have a good understanding of it), since, unless you can tackle it on a global scale, which seems impossible, individual universities/research centres are still going to have a vested interested in producing unreplicable stuff or just drivel, and this stuff will get into the scientific literature.
Speaking of medical drivel (and if you want a laugh), I couldn’t help but notice this, as being a prime example in the paper today.
The link doesn’t seem to be working. So here it is again:
http://news.theage.com.au/breaking-news-world/technophobia-begins-in-the-womb-experts-20101017-16osz.html
I found the linked article a bit of a disappointment really. Once you get past the slightly sensational title it’s nowhere near as luridly polemical as this post suggests. In fact I got to the end of page 2 without feeling the least bit scandalised. As the article’s author, David H Freedman notes:
If it’s a scandal that medical researchers get it wrong, then why isn’t it equally scandalous that physicists and researchers in all those other fields of science that weren’t explicitly named are getting it wrong? There are numerous areas of pure and applied science where getting it wrong can have adverse social consequences – including public health consequences. Agronomy for example.
It’s worth noting the concluding part of the article, where Freedman notes that Ioannidis’s work hasn’t outraged the medical research community, that in fact they’ve positively embraced it and recognise that publication bias is a serious problem. That raises serious questions, for me, about where the disgrace is and whether a “concerted effort to try to develop protocols to avoid some of the worst of this” is even necessary. What you’re in effect saying we should demand is more of a visible effort from medical scientists to get it right: the hazards of such demands are described in the penultimate paragraph of the article:
To finish, a not quite gratuitous link to a review of a recent book on scientific fraud.
Well said Gummo. There are manifest pitfalls in the establishment of a robust evidence base and knowledge accrues and advances in a continuing dynamic. But look at what it can deliver: HIV has gone from wtf?!! in 1981, to a chronic manageable disease (for those with access to effective antiretroviral therapy) in one generation.
“Tell me again… what is the purpose of peer review”
To establish credibility of method. Not validity of finding. I think it’s important to remember that.
Conrad,
I’m not sure we think that differently. But it’s odd that your reference to the obstacles is a kind of counsel of despair. Obstacles are obstacles. I hear this all the time in economics. We couldn’t do [insert good policy idea] because [insert all those who would oppose it]. On that basis we wouldn’t have tried with tariff reform, tax reform, compulsory super, corporatisation or anything really, as it’s easy to tell a story of people who won’t like it.
There are, I agree things that are not in contention. Personally I’d like to see all those who blather on about reducing our consumption campaign for rich countries to send 10% of their produce to poor countries. I don’t think that’s possible, so I don’t really talk about it.
If one could get some serious leadership, and start asking some hard questions about how we build a research quality framework, then one ought to be able to make some progress.
But of course, perhaps not.
Gummo and Geoff are playing “Medical science: Good or bad”. They’re plumping for ‘good’. I guess they must think that I think it’s bad. Anyway, for the record, I agree with Gummo that med science isn’t necessarily worse than lots of other areas of science (something I more or less said in the post – it’s certainly not as batty as economics) and I agree with Geoff that transforming HIV from a death sentence into a chronic disease was a neat trick.
I talked about a lot of these issues HERE. The key protection against false positives is supposed to be verification studies. I am staggered that 11 of the 45 studies had never been retested. In fact, I rather doubt that this is representative of normal practice.
As a medical researcher I’m seriously thinking of ditching my PhD studies and going private sector primarily because of the whole publishing mess. I have plenty of interesting non-results – some that would even be of interest to clinicans – but getting them published is near impossible without spinning the paper into some hideous misrepresentation of what the data actually is.
As far as peer review goes I think there are some genuinely good reviewers who give great feedback, but plenty of others who don’t even seem to read the paper. One clinican colleague of mine even had a decent short communication sit on the editor’s desk for nearly 6 months, totally ignored until the editor was repeatedly contacted and then the piece was quickly rejected… gah….
I don’t have any experience with Government ranking except to say most academics I’ve talked to are extremely frustrated with the system.
I’d seriously consider having grants for repetition of work, particularly that of other medical scientists in Australia. The upside is that it would encourage collaborations and the grant money could be used for pilot studies to further characterise the work or even do something entirely different. The downside is that after a 10 years all the HIV labs will be collaborating, as will the TB labs, influenza etc so less incentive to objectively consider the results of other labs. Not to mention the complications of political rivals checking each others work.
No, Nicholas, I was playing “Medical science: no worse than any other discipline” and challenging the way you’ve presented the state of medical science as especially scandalous. And your opinion that the state of medical science is especially a disgrace. And that it’s incumbent on medical scientists, especially to get it right, especially given that second last paragraph of the linked article:
Of course that’s not a problem for economists – the world simply expects them to be completely wrong as a matter of course.
You’ve got me there Gummo,
I feel like Mr Voles.
Mr Voles didn’t try hard enough:
“Ah well, you don’t have any first hand knowledge of that, do you – you’re relying on a historical record that’s been falsified by post-modernist academics.”
Michael Neilsen emailed me with this corroborating link
http://crookedtimber.org/2006/09/19/attractive-models/
The Atlantic article is not to be mistaken for “science,” but Ioannidis’ material is available free on the internet, and his 2005 PLoS article on this topic is not difficult for the non-scientist to understand and covers bias and offers 6 corollaries. My favorite: The hotter the scientific field (with more scientific teams involved), the less likely the research finding are to be true.
As a consumer of health, I’ve seen all sorts of miracles come and go, the one most startling was HRT. A recent JAMA article is still debating that one and the WHI pretty much ended that in 1992.
What I find shocking is how often government social programs are built on baseless medical findings–like the current obesity + poverty + race + farm markets + vegetables grants available to anyone who wants to apply.