Comparing the Census to alternative data or information: What is the right counterfactual?

You might think that on a post about counterfactuals, we’d have a picture of sliding doors together with two contrasting pictures of Gwyneth Paltrow. But you’d be wrong. We’re full of surprises here at Lateral Economics.

By Matt Balmford; Gene Tunny; and Nicholas Gruen

Our inaugural blog post for the Valuing the Census project provided an overview of our strategy for estimating the benefits of the Census to the Australian community.

From here, we’ll be blogging about various issues that we’re grappling with in making this assessment.

To assess the value of the Census, we need to assess the additional value it generates over alternatives in its absence.

These alternatives are counterfactual scenarios to assess the benefits of the Census against.

So, that value is of the extra precision or insight for a given use, compared with the next best alternatives.

(In later blog posts we’ll get into what Census-related data or information means, as well as what the impacts of this extra precision or insight might be.)

But what are the alternatives to the Census?

They might include personal experience, or stylised facts, or the ABS’s survey-based official statistics, or market research data, or many other administrative or non-official data.

Alternatives will obviously vary by issue and use, and not all alternative data will be available to all people (for example, due to commercial or confidentiality restrictions) or be easy for them to understand and/or use.

In a hypothetical world without a Census, would the ABS do something else to fill some of the gaps created?

Would the ABS or other agencies create a new next best alternative source of data for some uses and if so at what cost?

The potential counterfactual scenarios are varied and prompt consideration of a wide range of issues.

For example, consider these hypothetical scenarios:

  1. That data today sourced from the Census because unavailable. Users will need to make best use of whatever alternative data is currently available from ABS or other sources. This saves the Australian government the whole incremental cost of the Census (which were around $500m in 2016).
  2. A Census is conducted less frequently (for example, every ten years as in the US). Census data would become less useful as it became more out-of-date, but it would also reduce costs (presumably  by around half) some of which savings could be invested in expanded ABS surveys to plug the worst gaps.
  3. The traditional Census every five years is replaced with greater use of ongoing large-scale sample surveys (closer to the French rolling census) and integrated administrative data in official statistics.  It is unclear what the costs of this would be, or how quality would be affected.

Our current thinking is that, given the purpose of this exercise, it would be the most simple and straightforward to consider the first scenario that is ‘with’ and ‘without’ the Census.

This reduces the number of permutations and uncertainties in play.

Since our valuation will draw in part from stakeholders’ perspectives on alternatives to Census-related data, the thought experiment needs to be clear and consistent for them, too.

We are hoping that stakeholders will be able to understand this ‘with’ and ‘without’ scenario.

For example, since 2006 the Census has incorporated information of educational attainment (i.e., highest year of school completed or level of highest non-school qualification).

This helps users to investigate the relationship between levels of education and employment outcomes, income and other socioeconomic variables. It is often used as a proxy measure of socioeconomic status.

While various sample surveys also look at this issue, their sample size is too low to give sufficient detail for small population groups and for small regions or localities (for example, with respect to Aboriginal and Torres Strait Islander peoples).

This is the sort of comparison between Census-related and alternative data and information that we’ll be looking to explore further with stakeholders.

In doing this, we acknowledge that simply discontinuing the Census is not a realistic scenario with the kinds of options set out in the second and third scenarios being more likely (and which statistical agencies worldwide are debating).

However, we think it is a reasonable position to take for a first and already challenging Australian assessment of Census value, and one that is not trying to assess the relative desirability of different Census models.

It is also the position taken, with varying degrees of explicitness, in recent UK and NZ analysis on similar topics.

Do you agree with our current thinking?

Are there other approaches we could take that will be both valid and practical?

If so, please let us know by commenting on this post or emailing us at [email protected].

This entry was posted in Economics and public policy. Bookmark the permalink.

5 Responses to Comparing the Census to alternative data or information: What is the right counterfactual?

  1. paul frijters says:

    sure, with/without is an obvious way to get at an estimate of ‘current value’.
    However, you then need to do more than just try to get some notion of how much the additional data improves understanding and how much that added understanding is worth.
    You also need to consider what these statisticians and surveyors will be doing otherwise: how will they use their time? Their alternative options might create a lot of value. What would these statisticians and surveyors be paid in the private sector? Probably more than in the public sector, so there’s an added economic benefit right there.
    Of course you might consider the alternative possibility, which is that the census gives jobs to people who would otherwise be unemployed. Then the act of the census itself is like a workfare program.
    So the with/without calculus needs a good look at what the activity of the census itself is: who it involves, what the alternative jobs would be of those involved, possible economic stimulus elements in areas that would otherwise not get stimulus, etc. The end product might be useless but the activity might not be….(or visa versa)…

    Another way to go about it is to have some notion of entire alternative information systems that you could have.

  2. conrad says:

    I can add:

    4. Replaced with smaller groups that are paid for their participation that are constantly sampled. This would get rid of a lot of error from people who don’t want to give-up their info for one reason or another or just can’t be bothered filling it in.

    Another thing you could do is go through all the questions and decide which ones are actually likely to be useful for anything, whether they are useful but don’t need massive sampling, or are simply easy to get elsewhere. For example, surely the immigration records have data close enough to where a person is born that you don’t need everyone to fill in that stuff. I also don’t see why marriage records are needed at all, and surely that data could be got from other sources also (c.f., defacto status,) etc .

    Going through the questions might let you cut out a lot of the survey in terms of its usefulness and just try and stick a money amount on the smaller amount of information that hopefully generates most of the value.

    An alternative way to think about this would be to flip the question and ask something like: “What are the 20 most important bits of data that governments need to plan properly”. This way you could think about whether the census is vital in them or not. If the answers is none of these, obviously the 500 million is going to be harder to justify.

  3. What are the 20 most important bits of data that governments need to plan properly”?
    Seems a good question to start a discussion from.

  4. Brad says:

    There are several ways in which a population can be counted. A conventional modern population census attempts to provide a count of all the people within a territory at a specified time. A census normally involves eight elements: (1) a definition of the population to be considered as part of the census; (2) determination of the content to be included on the census questionnaire, usually based on an extensive examination of users’ needs; (3) careful testing of alternative questionnaire wordings and formats; (4) systematic preparation of lists of dwellings in which the population lives; (5) the hiring and training of enumerators; (6) the distribution of questionnaires and use of enumerators to question either the whole population or only those people who did not satisfactorily complete their questionnaires; (7) the processing and analysis of the census questionnaires; and (8) dissemination of statistical summaries using a variety of media. Within the general description, there are many variants of the specific procedures.

  5. Chris Lloyd says:

    “This helps users to investigate the relationship between levels of education and employment outcomes, income and other socioeconomic variables.” I don’t think the census data is much good for this kind of thing. You would want HILDA for such questions.

Leave a Reply

Your email address will not be published. Required fields are marked *

Notify me of followup comments via e-mail. You can also subscribe without commenting.