How do we know if the stimulus worked?

Sinclair Davidson has extracted a concession from David Gruen at the Treasury regarding some purported evidence for the efficacy of recent fiscal policy, that appeared in the Budget Papers. But before we consider the specifics, it’s worth thinking through how one would discover in principle whether the stimulus worked.

If a laboratory experiment was possible, we would re-run history over a two-year period without the fiscal stimulus and measure the growth of GDP under those conditions. But we can’t, so what’s the next best thing? There are three basic options:

(i) Compare with the experience of other economies. If there are some other countries that seem to have been in comparable positions at the same time but implemented no stimulus or a much smaller one, we can compare the growth paths and cautiously attribute the difference to the fiscal policies. But what constitutes a comparable position? (In other words, what are our ceteris paribus conditions?) A similar initial decline in GDP? Similar paths for other components of spending? Obviously, if the country with the fiscal stimulus also enjoyed an export boom, the higher growth rate could be attributed to that. So you would have to choose a ‘control’ country with a similar export story. On the other hand, a surge in consumption in the stimulus country might be a desired effect of the stimulus, so it would be wrong to control for that. But what about investment? Is that, like consumption, affected by the stimulus, and if so, is it crowded in, or is it crowded out? If differences in exogenous exports or investments might interfere with the comparison, we could adjust for those. But by how much? Of course, what economists actually would do is not pair-wise comparisons, but cross-sectional, multi-variable regressions which include all the variables that might explain the variance in growth performance. That way we adjust for all the complicating factors, and the estimation procedure isolates the influence of fiscal policy and calculates the multiplier for us.

(ii) Compare with the experience of the same country on other occasions. The same logic applies as above, but instead we have time-series analysis. The problem here, however, is a shortage of comparable episodes. Significant discretionary fiscal expenditures are few and far between. So there isn’t enough variance in the data to give us a statistically significant estimate of the fiscal multiplier. Even if there were big year-to-year variations, macro theory predicts that fiscal policy makes a bigger difference to economic activity in a slump than in a boom, so constraints might have to imposed to estimate the multiplier for slump periods specifically. In any case, the obvious problem with the time-series approach is that the current fiscal episode is only adding a few points to the data set, so it doesn’t tell us much more than we already knew: in other words, there’s not much difference between asking ‘did the stimulus work?’ after the event, and ‘will the stimulus work?’ before the event.

(iii) Compare with a counter-factual simulation in a suitable model. This could be an aggregated macroeconometric model or a multi-sector Computable General Equilibrium model. To be credible, the model has to pass a few basic tests: first has to track the economy’s trend growth path when the exogenous variables are at normal levels, or following their trend paths; then, it has to track the historical path of the recession when the actual data for those exogenous variables are plugged in. If the model passes these tests, then we can carry out the experiments with some confidence, at very least, that there isn’t a better way of quantifying the effect of the stimulus. (This, after all, is what these models are for.) We can ask what would have happened if there had been no fiscal stimulus, a different kind of stimulus, or either of these in combination with a different monetary policy.

Back to the issue at hand. Budget Statement No.2 included a scatter graph, proclaimed by Possum Comitatus to be ‘the most important chart in the budget‘, showing an apparent correlation between the amount of fiscal stimulus and the ‘impact’ of fiscal stimulus in eleven countries. The impact is the difference between actual GDP growth and IMF forecasts made prior to the implementation of the stimulus. A number of points should be made about this:

1. Suppose we could actually measure the ‘impact’ of the fiscal policy in each country, by observing what would have happened absent the stimulus in a parallel universe. The point of collecting such impact measures for a whole bunch of countries would be to extract some generalisations about whether fiscal stimulus works, and how consistently. We could calculate the average multiplier and the range of variation. If the scatter diagram revealed a nice correlation, we could declare that the multiplier is pretty similar from one country to the next. Policy makers in each country, or indeed in country outside the sample, might feel justified in concluding from the statistical analysis that fiscal stimulus has the same effect in different situations; they would know what to expect in their own country next time.

2. The parallel universe in the Treasury graph is merely an IMF growth forecast. I don’t know exactly how these forecasts are done. But there are two basic possibilities. One is that they are simple extrapolations of recent GDP trends, based on some ad hoc formula. If this is the case then, in the context of an unprecedented departure from long term trends and extreme uncertainty, they are as good as meaningless — the numbers effectively plucked out of the air. The alternative is that they are based on some kind of structural model. This might not be very sophisticated, but at a minimum it would involve some assumptions about how discretionary fiscal changes are transmitted spending multipliers, which determine what will happen when investment and exports fall without any offsetting additional government spending. In that case, the outcomes cannot be thought of as data from which we can draw inferences about the effect of fiscal policy, but are rather themselves inferences from what we already know, or think we know, about the effect of fiscal policy.

3. There is also the issue of whether the counterfactual for each country assumes that a fiscal stimulus proceeds in all the other countries. If the IMF forecast shows what would have happened in Country A if no countries implemented a stimulus, then comparison with the actual growth path, in which Country A benefited from a coordinated, global fiscal expansion, will exaggerate the impact of the domestic stimulus in isolation. But then, it would be wrong to conclude from that observation, that fiscal policy is overrated as a response to recession.

4. Before proceeding further, it’s worth making the point that, if one is going to use a model simulation as a control, it makes little sense to use forecasts made before the event, rather than counterfactual simulations. The aim is to isolate the effect of the fiscal stimulus by keeping the paths of all other exogenous variables the same, so why not run the model using ex-post observations rather than ex-ante projections of those variables?

5. Assuming that the IMF forecasts are based on a structural model, however simple, and setting that last issue to one side — let’s suppose the projections for exogenous variables were spot on, so there’s no difference between the forecast and the counterfactual simulations — what difference does it make that the ‘impact’ is measured by comparing the actual path of GDP with a model simulation rather than an observable counterfactual? The answer is: all the difference in the world. All the comparison tells us is how similar the structural models were in the first place. Because we’ve worked hard to make each individual country model credible, according to procedure (iii) above, we can be confident that it’s giving us the best estimate we’re likely to get of the impact. But there’s no point in trying to confirm one model’s estimate by looking at another. If two models both revealed fiscal multipliers of 1.5, all that would show us is that the models are similar in design. A correlation between stimulus and outcome across a number of countries would be equally meaningless. I was a bit surprised to see such a dodgy exercise — at least as far as I understand it — in a Treasury document in the first place, and to see John Quiggin calling it a ‘striking result’.

Davidson protests, and the Treasury now accepts, that the ‘striking result’ breaks down when more ‘data points’ are included. Gruen also experiments with other data sets, with mixed results. But it seems to me, unless I’m seriously misunderstanding something, that the results of all of these regression exercises are pretty spurious.

It’s worth looking at some more sensible attempts to assess the effectiveness of stimulus measures around the world, such as the efforts of the (President’s) Council of Economic Advisors, but that’s a topic for another post.

This entry was posted in Economics and public policy. Bookmark the permalink.
Notify of
Newest Most Voted
Inline Feedbacks
View all comments
11 years ago

Great post James. I was astounded that there was no effort made to control for other factors. Especially curious was the lack of consideration of actual monetary policy outcomes (and different monetary policy scenarios) and their effects on the efficacy of stimulus. Surely an accomodative vs restrictive monetary policy makes a big difference? Zero Lower Bound (ZLB) considerations are relevant for anumber of countries (i.e. Australia could ease monetary policy much more significantly than say the US as we never hit the ZLB). Finally, and probably most importantly, the seemingly ad hoc selection of countries in the original regression was strange and little justification was provided. It really looked like an amateur effort and it demeans the good people at Treasury.

Fred Argy
Fred Argy
11 years ago

It is a very helpful analysis, James. But you seem to end up with no firm conclusion. That seems to me somewhat out of place.

I still find the Macroeconomic experience very relevant. Remember that the major Euro countries did not offer enough of a fiscal stimulus and they got a relatively bad outcome. The countries that did the most with fiscal stimulus include such countries as USA, Korea, China and Australia. They got much better results.

You also note that “if the country with a fiscal stimulus also enjoyed an export boom the higher growth rate could be attributed to that”. But Australia suffered a big decline in commodity prices – from an index of 120 to 80 in 2009 (RBA index). This is one reason why the RBA took instant monetary action. Commodity prices went up in 2010 but have since declined markedly. This may also be one reason why some mining companies have recently revised down their projects (apart from the resource rental tax)

John Quiggin
11 years ago

James, IIRC, I said the result was striking (which it was) but that the analysis would certainly be subject to subsequent debate (which it was).

Coming to the substantive question, there are really two points at issue here

1. You assume that the IMF projections made early in 2009 incorporate a predicted effect of fiscal stimulus. I assumed (but haven’t checked) that the projections were made before decisions on a fiscal response had been adopted. If that’s right, then the projections are, in effect, estimates of outcomes in a parallel universe with no stimulus (or maybe, a uniform small stimulus in all countries).

2. The choice of countries to test is a difficult question. On the one hand, you don’t want to cherrypick your data, on the other hand, the countries excluded in the initial Treasury analysis seem very different from the rest. The OECD is the obvious choice, but then countries like Iceland present a problem as Gruen notes. A regression on OECD countries weighted by GDP, and allowing some kind of asymmetric response to fiscal contraction might be the best choice.

Paul Frijters
Paul Frijters
11 years ago


nice, but this is not the end of it. Let me make you the following falsifiable predictions:

1. In the coming years, some respected economist (perhaps even a Nobel Prize Winner) is going to come up with a theory-based growth regression ‘proving’ the stimulus packages were all counter-productive and lead to a longer-term decrease in growth because of the inefficiency of all the spending. That economist will also say that the lack of complete melt-down is due to monetary measures and trade measures.

2. Someone is going to say all the spending has had no effect whatsoever.

3. Someone is going to say, on the basis of extensive empirical analysis, that the stimuli were good things.

4. The political consensus is going to be that they were good things to do because there was no complete meltdown and too much political capital is at stake to say it wasn’t a success. The success story-line is already in place and would take an extraordinary intellectual consensus-effort to dislodge at this point, which I cant see happening.

5. We will never know the truth, precisely because there is too much discretion in the empirical and modeling aspects of the problem.

If you ask me, I’d say in the case of Australia it has been a success for various reasons, not the least of which is the therapeutic effect of implementing policies (we didnt just sit on our hands. We did something. We were proactive. Such symbolism is of great importance in politics). Of far greater economic importance than the stimulus was, IMO, that we didn’t make any big mistakes, like closing the borders or increase interest rates.

11 years ago

…a transparent multivariate cross-sectional analysis sounds more promising than a single-variable regression where all the other factors are packed into an opaque counterfactual. Unfortunately, the more countries we include in the sample, the more canditates there are for the set of explanators, reducing the number of degrees of freedom.

Very true. But then doesn’t that beg the question: why did the Government rely on it so heavily (obviously Treasury are merely following orders)? If the sample size is too small to even incorporate two or three regressors then what value is it? Isn’t it safer to just say that the data no definitive conclusions can be reached and move on?

Instead of claiming that the analysis showed the efficacy of fiscal stimulus, why bother with it at all? By putting it out in the public domain so prominently, they were almost inviting someone to trash it (deservedly).

I’d love to know what the worker bees in Treasury think. I certainly would not want to have my name associated with analysis of such dubious quality.

Nicholas Gruen
Nicholas Gruen(@nicholas-gruen)
11 years ago


I’d be surprised if it wasn’t the worker bees idea.

11 years ago

Nicholas are you seriously suggesting that anyone who had done anything more than 1st year stats would have suggested such with a straight face? This looks more like something cooked up in a minister’s office. Granted treasury sunk a lot of reputational capital into the stimulus package so they also have a big stake in fostering the impression it was successful. I remain highly sceptical.