Quantifying Institutions 3 – A glimpse of a glimpse?

In the first post in this series I talked about recent empirical work on institutions and development and the problems I had with the use of constructed indices for measuring institutions. In the second post I talked about a particular paper I decided to retest and the alternative ways people had tried to test institutional hypotheses empirically.

In this one I will talk about an institutional measure I attempted and the results I found.

This measure is based on a very fundamental institution in all human societies: Language. Following on from the promising results about colonial institutions I had found, I decided to use the extent of a colonial languages spread as a proxy.

The penetration of a coloniser’s language should represent the penetration of colonialism and associated institutions into the institutional fabric of a society. If you’re speaking the coloniser’s tongue, you’re playing the coloniser’s game. Language is a very basic institution of interaction, and its a game you need to be absorbed in to play. To have reached such a fundamental institution, you’ve already immersed yourselves in the other institutions that surround it. You’re playing in the extractive, grabbing, zero-sum games of colonised society. Those that can destroy the expectation that gain can be mutual for instance, and in general destroy much of the trust in transactions and other social and economic exchange.

It is important to note that this isn’t Whorfism. The characteristics of the language itself are of no consequence here. Likewise, it is not reflective of the native institutions found in the colonial languages home society. It is merely the language in which the games of colonial society were played.

I have had a long fascination with languages as institutions and their relationship to other institutions, but all my other questions were too large for Honours. This was a good opportunity to indulge myself, so I was eager to embrace the concept.

There had been efforts to use language in empirical research before. Some were foolish, like the Anglophiles determined to show the democratic legacies of the Empire in contrast to Gallic tyranny. Some slightly less virtuous but unrelated to institutions, such as determining ability to trade with the global economy.

I decided that I could use the extent of colonial language penetration at the end of the colonial period as a proxy for the extent to which colonialism’s grabbing institutions had been present. This would allow me to test the Norwegians’ results with an alternative variable. Importantly it was at the end of the colonial era so that the language data could be taken as exogenous the the growth period being measured.

Bear with me as I outline the rather simple empirical work I did. The results were strong, so I’m still very suspicious of them and eager for feedback on the result.

Their empirics followed on from some fairly ad hoc work done by Sachs and Warner in the mid 90s, which covered growth from 1970 to 1990. They had used 5 independent variables to explore differences in the average annual growth rate (per head of economically active population) from 1970 to 1990 across a wide panel of countries.

These were:

Initial income at the beginning of the period This had negative correlation, hypothesised by the catch up effect.

Openess to trade A ludicrous measure based on the average of a yearly assigned binary judgement by the authors.

Resource Abundance Measured as primary exports as a share of GNP in 1970. Negative correlation due to the resource curse. There are of course issues with this as a measure of resource abundance, but they are not too relevant at the moment.

Various institutional indices – These were used collectively as one amalgamated variable, and I intended to replace this alone.

Average yearly real gross investment 1970-1989

. The Norwegians’ empirics simply involved replicating the Sachs and Warner regressions with a different data set, an amalgamated institutional index and a suspicious change in specifications to logs with no explanation. To this they added a single additional variable.

Institutional quality * Exports as a share of 1970 GNP.

This was intended to showcase the interaction between institutions and resources. This wasn’t too relevant though.

I chose this model to test my variable in. In hindsight I might have looked for a model more focused on institutions, but at the time it seemed useful.

For my own variable I managed to find an encyclopaedia of languages from 1964. For each country or colony, it provided a population (rounded to the nearest hundred thousand) and language speakers (rounded to the nearest thousand or ten thousand). As crude as this data was, it was the only set I could get for the very end of the colonial era. Fortunately most of the countries became independent in a very short period around the mid 60s but it was still only approximate for most of them. This ensured the variable was external to the growth being studied (1970 to 1990).

The variable I created was simply the percentage of the the population in 1964 that spoke the colonial language. This was expressed as a decimal between 0 and 1 (like the indices I was replacing) and I specified it in the regression as 1-pccls (percentage colonial language speakers) so it would likewise have a positive sign if my hypothesis was correct.

I excluded all non former colonies because my variable did not make sense in their context. I also excluded new world colonies (including Australia). This was because my reading in linguistic history had shown that the effect of disease had contributed greatly to the spread of non-indigenous languages there; both because depopulation allowed settlement, and because the surviving natives were mainly of mixed parentage. This meant that the colonial speakers had come to their language by background and not enmeshment in the colonial institutions. This left Asian and African colonies for which I had sufficient data.

Because the Norwegians did, I added variables in turn. I’ll start with regression 2. Regression one merely established that a resource curse existed.

Here my measure failed to jump through the 5% significance hoop.

But then…

The 0.5% hoop. 0.5%!

Interestingly, the Norwegians’ institutional measure became insignificant when investment was added. It had only just jumped through the 5% hoop in the previous regression. The unexplained use of logs and the close proximity to the arbitrary 5% value had made me slightly suspicious of their results, so the performance of the alternate value was even more striking.

A crudely fashioned variable from rounded data in an old encyclopaedia was more significant in explaining divergence than several indices together.

The next regression failed to replicate the finding of interaction between the resources and institutions, which was ostensibly the main finding of the research.

My variable accounted only for about 5% of average annual growth variation between the countries, which is modest, but it is more than twice what the existing variables showed. I was also intrigued that investment increased significance. I would have hypothesised co-linearity (where two regressors are related), since a good institutional environment should increase expectation of return on investment, and thus increase investment.

When I tested both the Norwegians’ variables and my own on an identical (but smaller) data set of ex colonies, my variable remained highly significant significant, whereas the institutional index used by the Norwegians didn’t come close to significance. Without the rich countries, which were the basis of what was considered good institutions, correlation was lost.

This is a very very modest piece of research with severe limitations. There is limits from the data, and I’m sure many people would have problems about how I treat language acquisition in colonial societies (I have problems myself and I hope a few I considered will arise in comments). It is also limited to a narrow context of old-world former colonies. It does suggest a way to progress however . We can create decent variables to measure institutional quality and examine their origins from real world data. The data for my variable was extremely crude, but it seemingly performed well.

Is there something massive I, and the limited number who have looked at my work, have overlooked that explains why this variable is so significant? It seems far too strong. Furthermore, is this approach finding real life variables as proxies for institutions a good way to salvage this kind of approach, and what (if any) other proxies could be used?

I hope this stands up because if the current research goes on in its current vein then either institutional approaches will be considered discredited, or else will be neutered and reduced to a limited and meaningless concept.

About Richard Tsukamasa Green

Richard Tsukamasa Green is an economist. Public employment means he can't post on policy much anymore. Also found at @RHTGreen on twitter.
This entry was posted in Economics and public policy, History. Bookmark the permalink.
Notify of
Newest Most Voted
Inline Feedbacks
View all comments
Paul Frijters
11 years ago


Let’s re-cap: you say that the rest of the literature uses dodgy institutional measures based, at best, on vague cultural concepts and lacks a clear conceptual framework within which to come up with better measures. What you end up doing is to take 37 numbers out of a 1964 book that tells us the sheer percentage of a population deemed to speak English and use that in a cross-sectional GDP growth regression for a bunch of African and Asian countries. You dont tell us which countries ‘do the work’ (i.e. which ones fit and which dont), nor do you distinguish between types of English spoken or link the English to anything one might think of as an institution.

Let’s think what this variable will mainly pick up that the others in the regression dont. Since we’re looking at explaining cross-sectional variation in growth it essentially tells us that former English colonies did better than non-Engish colonies in this period, which of course is a finding highly contingent on kicking out all the non-colonies (like China) and on which non-English colonies are included. For one, such findings are notorious for being outlier dependent, i.e. they can be driven by high-growth outliers (such as Botswana which I am guessing has a relatively high number of English speakers in 1964 given its colonial ancestry) hence sample selection is a key thing to look at.

Basing too much on a regression of 5 variables on 37 data points is not warranted though. There’s a conditional correlation with various innocent candidate explanations which might have nothing to do with institutions.

Minor point:
– it seems to me everything is highly significant in the second regression (the one adding the investment variable INV7089) hence I cant see the basis of the claim that the Norwegian’s ‘ludicrous’ openness variable became insignificant.

11 years ago

I think India would make a counter example to almost everything you have presented here. On the language front, India was invaded by the Persian Empire, absorbing language, social institutions and religious institutions. The words of Zarathustra and Mohamed enlightened India long before Brown Bess spoke to the Punjab. The latter wave of Anglo-Saxon colonists brought their own language and institutions which themselves were an eclectic collection. Other concepts such as reincarnation and a stable, modern multitheistic framework allowing broad spectrum of belief are fundamentally Indian, and have stubbornly persisted regardless of foreign influence.

The problem being that both language and institutions are the product of many sources so although you can take quantitative measurements on one particular event where one culture is imposed on another, there is no way to disentangle multiple such events. For example, the ideas of Moses reached India by two distinct paths (via Mohamed; and via Christ, Peter, Luther, Henry) so from a quantitative perspective which of those paths is significant? There’s no way to tell.

If youre speaking the colonisers tongue, youre playing the colonisers game. Language is a very basic institution of interaction, and its a game you need to be absorbed in to play. To have reached such a fundamental institution, youve already immersed yourselves in the other institutions that surround it. Youre playing in the extractive, grabbing, zero-sum games of colonised society. Those that can destroy the expectation that gain can be mutual for instance, and in general destroy much of the trust in transactions and other social and economic exchange.

Colonists bring technology, usually evidenced by a superior military position or a superior transport and mobility infrastructure (no surprise, the two are always closely related). Consider the British construction of railways in India, of course they needed local labour to achieve their targets and a local signing up as a railway administrator or engineer loses some of their own independence and culture in exchange for learning the ways of the colonists. On the other hand, it’s a plumb job, and early adopters of new technology often reap a lot of personal benefits. Here we are a century later, India is well and truly independent but railway jobs are still plumb jobs and as a nation they are only too happy to keep expanding on the system that the British put in place.

Forgive my bluntness but please put away the zero-sum flagellation and save it to only be brought out on the ceremonial occasions hosted by your university Social Science department.

Resource Abundance Measured as primary exports as a share of GNP in 1970. Negative correlation due to the resource curse. There are of course issues with this as a measure of resource abundance, but they are not too relevant at the moment.

The most obvious issue being that a low-tech nation will automatically become resource abundant by your measure, and the negative correlation could just as easily be the low-tech curse.

A more subtle problem is that although primary industries tend to get lumped into a single bucket by economists, what goes on at ground level is quite different. Going back to India again, the British planted tea and also extracted a whole lot of tea. Fast forward to the modern world and those tea plantations are still productive, still furiously exporting valuable primary resources, and probably will continue to do so until humans discover a cure for caffeine addiction.

On the other hand, the Spanish were primarily interested in the Americas for gold and sliver and the vast majority of that gold and silver is gone now. As market price of metals goes up, and people search harder, the total quantity available as a resource in the ground irrevocably gets smaller. A similar example further back is when the Romans invaded the Celts in search of tin, Rome got the tin they wanted and Celtic tin mines have long ago ceased any productive output.

The lesson is that mining makes a lot of money for a short time, agriculture makes steady money over a long time. From the point of view of nation building they should never be lumped together.

11 years ago

I don’t quite follow what makes an expansionist Persian empire intrinsically different from expansionist British or Dutch empires (except some stretch of time). Maybe you have good reasons to be interested in a particular timespan, I just feel that if there is a principle at work here then it should apply broadly (or else you should be able to explain why not). Besides that, even if you do want to ignore earlier conquests… you are never the less measuring a transition from “A” to “B” so you have a pretty good idea of what is “B” (i.e. the input from the conquering culture), but understanding the transition also requires knowledge of the “A” (where they started at before each wave of colonization).

Regarding the railways…

I’ll agree that it is difficult to conclusively prove that any institution is in the national interests rather than in it’s own particular interests. I’d put this down as one of the fundamental problems of economics and I don’t pretend to be able to solve this in any quantitative manner. However, Indian railways are big, they are growing and they are profitable, and these things do tend to be cited as “good things” by the majority of economists (and politicians, and CEOs for that matter).


Railways traverse through the length and breadth of the country covering 63,140 route kms as on 31.3.2002, comprising broad gauge (45,099 kms), meter gauge (14,776 kms) and narrow gauge (3,265 kms). As the principal constituent of the nations transport system, Indian Railways own a fleet of 2,16,717 wagons (units), 39,236 coaches and 7,739 number of locomotives and manage to run 14,444 trains daily, including about 8,702 passenger trains. They carry more than a million tonne of freight traffic and about 14 million passengers covering 6,856 number of stations daily.



[…] freely admit I partially fell into this trap (albeit somewhat consciously) with my own work, where I only attempted to measure a certain type of "bad" institution, rather than any kind of […]