Sinclair Davidson has extracted a concession from David Gruen at the Treasury regarding some purported evidence for the efficacy of recent fiscal policy, that appeared in the Budget Papers. But before we consider the specifics, it’s worth thinking through how one would discover in principle whether the stimulus worked.
If a laboratory experiment was possible, we would re-run history over a two-year period without the fiscal stimulus and measure the growth of GDP under those conditions. But we can’t, so what’s the next best thing? There are three basic options:
(i) Compare with the experience of other economies. If there are some other countries that seem to have been in comparable positions at the same time but implemented no stimulus or a much smaller one, we can compare the growth paths and cautiously attribute the difference to the fiscal policies. But what constitutes a comparable position? (In other words, what are our ceteris paribus conditions?) A similar initial decline in GDP? Similar paths for other components of spending? Obviously, if the country with the fiscal stimulus also enjoyed an export boom, the higher growth rate could be attributed to that. So you would have to choose a ‘control’ country with a similar export story. On the other hand, a surge in consumption in the stimulus country might be a desired effect of the stimulus, so it would be wrong to control for that. But what about investment? Is that, like consumption, affected by the stimulus, and if so, is it crowded in, or is it crowded out? If differences in exogenous exports or investments might interfere with the comparison, we could adjust for those. But by how much? Of course, what economists actually would do is not pair-wise comparisons, but cross-sectional, multi-variable regressions which include all the variables that might explain the variance in growth performance. That way we adjust for all the complicating factors, and the estimation procedure isolates the influence of fiscal policy and calculates the multiplier for us.
(ii) Compare with the experience of the same country on other occasions. The same logic applies as above, but instead we have time-series analysis. The problem here, however, is a shortage of comparable episodes. Significant discretionary fiscal expenditures are few and far between. So there isn’t enough variance in the data to give us a statistically significant estimate of the fiscal multiplier. Even if there were big year-to-year variations, macro theory predicts that fiscal policy makes a bigger difference to economic activity in a slump than in a boom, so constraints might have to imposed to estimate the multiplier for slump periods specifically. In any case, the obvious problem with the time-series approach is that the current fiscal episode is only adding a few points to the data set, so it doesn’t tell us much more than we already knew: in other words, there’s not much difference between asking ‘did the stimulus work?’ after the event, and ‘will the stimulus work?’ before the event.
(iii) Compare with a counter-factual simulation in a suitable model. This could be an aggregated macroeconometric model or a multi-sector Computable General Equilibrium model. To be credible, the model has to pass a few basic tests: first has to track the economy’s trend growth path when the exogenous variables are at normal levels, or following their trend paths; then, it has to track the historical path of the recession when the actual data for those exogenous variables are plugged in. If the model passes these tests, then we can carry out the experiments with some confidence, at very least, that there isn’t a better way of quantifying the effect of the stimulus. (This, after all, is what these models are for.) We can ask what would have happened if there had been no fiscal stimulus, a different kind of stimulus, or either of these in combination with a different monetary policy.
Back to the issue at hand. Budget Statement No.2 included a scatter graph, proclaimed by Possum Comitatus to be ‘the most important chart in the budget‘, showing an apparent correlation between the amount of fiscal stimulus and the ‘impact’ of fiscal stimulus in eleven countries. The impact is the difference between actual GDP growth and IMF forecasts made prior to the implementation of the stimulus. A number of points should be made about this:
1. Suppose we could actually measure the ‘impact’ of the fiscal policy in each country, by observing what would have happened absent the stimulus in a parallel universe. The point of collecting such impact measures for a whole bunch of countries would be to extract some generalisations about whether fiscal stimulus works, and how consistently. We could calculate the average multiplier and the range of variation. If the scatter diagram revealed a nice correlation, we could declare that the multiplier is pretty similar from one country to the next. Policy makers in each country, or indeed in country outside the sample, might feel justified in concluding from the statistical analysis that fiscal stimulus has the same effect in different situations; they would know what to expect in their own country next time.
2. The parallel universe in the Treasury graph is merely an IMF growth forecast. I don’t know exactly how these forecasts are done. But there are two basic possibilities. One is that they are simple extrapolations of recent GDP trends, based on some ad hoc formula. If this is the case then, in the context of an unprecedented departure from long term trends and extreme uncertainty, they are as good as meaningless — the numbers effectively plucked out of the air. The alternative is that they are based on some kind of structural model. This might not be very sophisticated, but at a minimum it would involve some assumptions about
how discretionary fiscal changes are transmitted spending multipliers, which determine what will happen when investment and exports fall without any offsetting additional government spending. In that case, the outcomes cannot be thought of as data from which we can draw inferences about the effect of fiscal policy, but are rather themselves inferences from what we already know, or think we know, about the effect of fiscal policy.
3. There is also the issue of whether the counterfactual for each country assumes that a fiscal stimulus proceeds in all the other countries. If the IMF forecast shows what would have happened in Country A if no countries implemented a stimulus, then comparison with the actual growth path, in which Country A benefited from a coordinated, global fiscal expansion, will exaggerate the impact of the domestic stimulus in isolation. But then, it would be wrong to conclude from that observation, that fiscal policy is overrated as a response to recession.
4. Before proceeding further, it’s worth making the point that, if one is going to use a model simulation as a control, it makes little sense to use forecasts made before the event, rather than counterfactual simulations. The aim is to isolate the effect of the fiscal stimulus by keeping the paths of all other exogenous variables the same, so why not run the model using ex-post observations rather than ex-ante projections of those variables?
5. Assuming that the IMF forecasts are based on a structural model, however simple, and setting that last issue to one side — let’s suppose the projections for exogenous variables were spot on, so there’s no difference between the forecast and the counterfactual simulations — what difference does it make that the ‘impact’ is measured by comparing the actual path of GDP with a model simulation rather than an observable counterfactual? The answer is: all the difference in the world. All the comparison tells us is how similar the structural models were in the first place. Because we’ve worked hard to make each individual country model credible, according to procedure (iii) above, we can be confident that it’s giving us the best estimate we’re likely to get of the impact. But there’s no point in trying to confirm one model’s estimate by looking at another. If two models both revealed fiscal multipliers of 1.5, all that would show us is that the models are similar in design. A correlation between stimulus and outcome across a number of countries would be equally meaningless. I was a bit surprised to see such a dodgy exercise — at least as far as I understand it — in a Treasury document in the first place, and to see John Quiggin calling it a ‘striking result’.
Davidson protests, and the Treasury now accepts, that the ‘striking result’ breaks down when more ‘data points’ are included. Gruen also experiments with other data sets, with mixed results. But it seems to me, unless I’m seriously misunderstanding something, that the results of all of these regression exercises are pretty spurious.
It’s worth looking at some more sensible attempts to assess the effectiveness of stimulus measures around the world, such as the efforts of the (President’s) Council of Economic Advisors, but that’s a topic for another post.