Education Research and Administrative Data by David N. Figlio, Krzysztof Karbownik, Kjell G. Salvanes
Thanks to extraordinary and exponential improvements in data storage and computing capacities, it is now possible to collect, manage, and analyze data in magnitudes and in manners that would have been inconceivable just a short time ago. As the world has developed this remarkable capacity to store and analyze data, so have the world’s governments developed large-scale, comprehensive data files on tax programs, workforce information, benefit programs, health, and education. While these data are collected for purely administrative purposes, they represent remarkable new opportunities for expanding our knowledge. This chapter describes some of the benefits and challenges associated with the use of administrative data in education research. We also offer specific case studies of data that have been developed in both the Nordic countries and the United States, and offer an (incomplete) inventory of data sets used by social scientists to study education questions on every inhabited continent on earth.
If we gave up on privacy altogether and were able to correlate creditcard data, hospital data, and tax data we’d have the possibility of really getting the answers to most health/lifestyle questions and adding some twenty years to life expectancy. I live in hope.
We don’t have to give up on privacy – we should just defend it as a worthwhile, not absolute goal.
Hmmm, life expectancy at birth for males in Australia is approximately 80 years and for females a bit more than 84 years (82 years overall average) – for the decade ending 2011 anyway.
So, somehow coupling all of every individual’s credit card data, hospital data and tax data would magically increase that to 100 and 104+ (avg 102). Well I’m always willing to believe in the magic of data, but could you explain just a little about how such a wondrous change comes about – especially for those with platinum credit cards, no hospitalisation and who hide most of their taxable income. And those who don’t have a credit card, can’t afford to go to hospital (unless dragged there by an ambulance) and don’t earn enough to pay income tax (and probably don’t pay council rates or even much GST either)
Or don’t they count ? Or isn’t there enough of them to affect the data ? Or are you only concerned with the middle class ?
OK, I’m game. The problem with medical research is that it’s virtually impossible to get a large enough research population to establish a statistically significant link between anything that’s less blatant than “Cigarettes are bad for you”. If we were able to look at the entire population of users of prescription drug X, stratified by income, and their hospital outcomes that would be a vast step forward; and you would then be able to control precisely for ‘drug x users in poor people with diagnosis A compared with non-drug users in poor people with diagnosis A” and ‘drug x users in rich people with diagnosis A compared with non-drug users in rich people with diagnosis A”. Going further, and with slightly less specificity (seeing that people who buy alcohol and bacon don’t necessarily eat it all themselves) we could compare through credit cards rich drinker households with rich non-drinker households, and ideally drug x users _and_ drinkers in rich people with diagnosis A with drug x users _and_ non-drinkers in rich people with diagnosis A. We’re talking a Snow map on a worldwide scale that would enable us to pick several thousand Broad Street pumps instantly. It would also help to introduce gene-level data, but that’ll probably be fed into medical records anyway in a couple of years. Haven’t you got a social epidemiology question that could be answered from the alldata base? One that would answer a question as to whether a particular intervention worked, or which intervention worked best, or which factors pointed at a hitherto unsuspected intervention?
Oh,and I realise that I slipped up by saying ‘Hospital data’. I should have specied medical records at all levels – hospitals and GPs and specialists.
specified. dammit.
Well maybe ‘specied’ is right anyway, the poor are a different species, aren’t they ? :-)
Ok, so lots of very imprecise ‘statistical correlations’, but what exactly are they worth ? What is the causal chain that is going to go from lots of statistical correlations to an extra 20 years of life expectancy at birth ?
The big improvements in life expectancy have come from hygiene (on a personal and national scale, eg mass sewerage), hygienic nutritional food, and a fair amount of it, and some attention to work demands. But they are all low hanging fruit – even massive individual level medical interventions don’t increase average life expectancy at birth by much – and certainly way less than 20 years.
So, well defined causal chain, please, as to why we will get an extra 20 years.
The tobacco causal sequence was that it was discovered that based on epidemiology there was a correlation between smoking and dying, people stopped smoking, and people lived longer. If we discover a correlation between, say, rye bread (or rye bread and fennel and tom yum paste) and heart disease people will probably eat less rye bread (or rye bread and fennel and tom yum paste) and live longer. More specifically, if we discover that the use of Doomeral and Diepreodone in combination correlates strongly with heart stoppage we may well stop prescribing Doomeral and Diepreodone in combination, reducing deaths. I’m not proposing any new scientific practice; we try to get exactly these stats now by means of asking people in large population studies, but those are expensive and partial so we don’t do many and we don’t do them particularly well. Did you see the Four Corners yesterday on testing medical procedures? Like that, only x 1000.
Yeah, yeah, but like I said, large amounts of individual based medical intervention only increases life expectancy at birth by a few years, not 20.
Now even though I omitted mass immunisation/vaccination from the big gain major factors (a momentary slip of the brain) it’s those factors that have made large increases in at birth life expectancy, not individual medical interventions.
Sure picking up the smoking/cancer connection prolonged a few million lives by a few years, but, to repeat myself, that makes only a minor increase in at birth life expectancy which is averaged over the whole population..
Now I ask again, what in all these statistical correlations is going to make a 20 year difference ?
Chrisb Hi
Could better data pick up the medical treatments that are worse than useless? (as well as those that are expensive and pointless)
So can we take it that you watched 4 Corners last Monday ?
And you were impressed by the major market distortion that a strong and protective trade union (the Australian Medical Association) can produce ?
On the other hand, all the big gain factors you cite as having been the drivers in increasing life expectancy – immunisation, birth protocols, etc – were, as you also say, basically exercised some decades ago, and are already incorporated in our present life expectancy; while life expectancy in Australia is still rising by some three months a year. Which, if it isn’t coming from immunisation, etc., is coming from medical advances. Which, if it could be supercharged by having for the first time the evidence we need to see what practices do extend life, would increase substantially (after all, if we managed to increase that four times, we’d all live forever…. no, not a serious suggestion).
IIRC the at birth life expectancy of Australians was approximately 58 at Federation – about 57 for males and about 59 for females. So, on those numbers, at birth expectancy has in creased by 24 years in an elapsed time of 114 years or about 2.5 months per year. Not far off your 3 months per year as a more recent count.
Now I can’t recall any episodes of mass starvation in Australia (unlike, say, China) and I can’t recall any really major disease epidemics (other than the ‘flu epidemic earlyish last century), and we’ve had sewerage and hygiene in Australia for all that time (at least).
So, do you have a hypothesis as to why the at birth expectancy should have so consistently increased long before the sophisticated medical interventions you’re discussing came about ?
To be pedantic, the figures I’m using (ABS) are 55/58 at Federation and 47/47 in 1881: since 1881, 0.25/0.28 years per year, or over 3 months.
More generally, you ask “do you have a hypothesis as to why the at birth expectancy should have so consistently increased long before the sophisticated medical interventions you’re discussing came about ?” Well, yes, lots – see the immunisation introduction schedules – but aren’t we deviating from the topic? I wasn’t saying that all advances in life expectancy in the past were due to sophisticated medical interventions, and falsifying that wouldn’t affect my argument. I was and am saying that advances in the future could, privacy-free, be due to _either_ sophisticated medical interventions based on better knowledge _or_ new population health initiatives based on better knowledge. A twenty-year improvement in life expectancy that would in the past have taken eighty years to work through could be done in a decade or two.
Ah well, 55/58 versus my recall of 57/59 – not so far wrong for an old and fallible memory.
But no, I don’t reckon I’m off-topic: I’m saying that even “sophisticated medical interventions based on better knowledge” simply will not produce anything even remotely as large as a 20 year increase in the at birth expectancy.
Nor will “new population health initiatives based on better knowledge” produce it either. Mostly because there aren’t any such initiatives, privacy or not.
We are not talking about actually extending human lifetimes, just preventing premature death, and there just isn’t a large enough scale of premature death. So, even if every single premature death was avoided, a 20 year hike in at birth expectancy wouldn’t result.
We know that life-extending possibilities exist, because they’ve applied every year since 1881 (apart from a brief glitch in the sixties). I think they can be speeded up considerably under certain circumstances; you, on somewhat unclear a priori grounds, think this is impossible. I think we’ve carried the argument as far as we’re going to get it.