The Evaluator General

I recently sent a couple of emails explaining the Evaluator General and also did an extended interview explaining the ideas in the context of Matt Jones’ Public Policy class at Melbourne Uni. The first email below is the one I sent him proposing that we explain the Evaluator General in terms of the course of my own thinking in developing it.


Given the subtlety of the idea of the Evaluator General – most people think of it as one idea when it’s several – one way to understand it is to go through the way in which it was the product of my own history in thinking about certain problems.

Steven Jobs talked about how life is like joining the dots backwards to go forwards and the Evaluator General is a response to these points.

  • In 1983 working for John Button as Industry Minister encountering of the Toyota production system and its extraordinary radicalism – the kind of thing for which (for once) it’s not an exaggeration to describe as a paradigm shift. For some of the flavour of this, check out the productivity chart below and this video by an American Toyota engineer.
  • Narrated back to myself through a few decades of thinking, I see this as standing for
    • The importance of building accountability from the ground up from the perspective of those who are being held accountable, not those holding them to account.
    • The gravitational force of the latter (wrong) way of doing it being almost of black hole magnitude – we are close to the event horizon. Warren Buffett has a term for it from his point of view which is “the institutional imperative”. He’s talking about the institutional imperative to grow – to aggrandise the business and its managers, rather than to husband capital to the advantage of its owner. In government the institutional imperatives are different – but they contain an institutional imperative common to business and government which is the institutional imperatives of bureaucracy. This is summarised in my little aphorism “if truth is the first casualty of war, candour is the first casualty of bureaucracy”.
    • The resulting tendency for systems of accountability to become systems of accountability theatre. In that regard, this essay is intended as a practical ‘prequel’ to the idea of the Evaluator General with this speech to the Australian Evaluation Society being the philosophical prequel though reading that one is only optional :)
    • Be that as it may, there are some miraculous cases where the institutional imperative has been avoided (as Warren Buffett has avoided it). They include
      • Open-source software;
      • The Toyota production system

Not coincidentally, in both, the profound, subtle and pervasive problem of truth-telling from the bottom to the top of the hierarchy appears to have been solved.

  • Arriving at The Australian Centre for Social Innovation (TACSI) in 2009 I discovered human-centred design and co-design, a powerful tool in seeking to deliver services from the perspective of the people you claim to be helping. But, despite wider claims being made for it, it is only that – a tool. It is not the system and the system is broadly correct in thinking of it as just one possible way to improve services. Neither human-centred design (or co-design) nor any other new tool contains within itself any clear recipe for the system to perform the tasks it must perform to
    • nurture good practice
    • learn from it
    • preserve and endlessly improve and optimise that learning
    • expand what works, change and if necessary abolish what doesn’t and progressively fit the relevant parts together as their roles and the division of labour between them change with learning how to improve them.
  • The Evaluator General is my attempt to build a system that might
    • Help innovations like TACSI’s be introduced;
    • Validate them against the evidence in such a way as to protect them from the institutional imperatives of managers further up the hierarchy;
    • Expand successes and improve or contract failures.
  • I did this by reference to modern political principles within the Westminster system which is to structurally separate doing and knowing. Thus Treasury is the line department responsible for advice and action to optimise growth and the ABS measures how well we’re doing in a way that’s independent of the Treasury – but nevertheless closely collaborative with it.
  • Another principle I’ve realised in retrospect is of great significance here I came across when thinking about political questions. That is the ancient Athenian term isegoria or equality of speech (or “ισηγορια” if you’re Plato, Aristotle or you’re just trying to be a smartarse). Toyota was my first engagement with isegoria, but it rumbles on through my life – and is of great significance to public policy.
  • 1. A second ancient Athenian principle is of particular significance regarding bureaucracies and this subject — Parrhēsia (or in Greek, παρρησία since you asked.)  The closest notion we have to it in English is that of speaking truth to power. The difference, though is that parrhēsia is a relational concept in which the inferior party ‘speaks all’ — which is to say speaks the truth boldly to power — with there being an implied duty on the other to listen and respond justly. This might be to a king or, as in the case of Socrates parrhēsiastic performance at his trial to ‘the people’. The court comprised randomly selected citizens.  I speak more about this here.

In summary, though people typically think of my Evaluator General as a top-down compliance type mechanism – using independence to browbeat the system towards addressing the objectives given it from the top, it’s actually two things and neither works very well without the other.

  • Independence
  • That independence is there not to perform ‘accountability theatre’ by imposing it from the top, but to build an accountability system (as Toyota did) based on the self-accountability of those in the field. This is what science does. And as Richard Feynman says, “The first principle is that you must not fool yourself and you are the easiest person to fool”. This is also Adam Smith’s idea when he talks of the impartial spectator as the foundation of morality (and, implicitly, of knowledge).
    • In Toyota, that’s workers on the line (and beyond them suppliers and customers).
    • In government programs, it’s ‘street level bureaucrats (and the communities they service). So it’s teachers, their students and their communities, nurses and doctors, their patients and communities, prison warders etc.


The second email explains the Evaluator General from scratch:

As usual, I learned much from your speech on economics and the third sector. The one thing among the measures I wanted to see in the speech was some commentary on our failure to properly build evidence-based policy and practice. The cost/benefit analysis you guys have done is obviously important and even this is missing in much of the interface between government and the charitable sector.

But I think there’s something much more fundamental that receives virtually no attention because I think the people who should be leading the debate – particularly policy economists – think they know what evidence-based policy and practice are, but in fact they do not.

The most dramatic way I can suggest the potential significance of what I’m proposing is by contrasting the labour productivity achieved with the usual top-down approach to evidence-based practice with a bottom-up evidence-based practice developed in business – by comparing US to Japanese automotive productivity over the 1970s and 80s.

My argument is as follows.

    • In the charitable sector and amongst many of the social services funded by government, we are not even at the level of top-down evidence-based practice, because, as you acknowledge in your speech, we make far less use of cost/benefit analysis than we should
    • Part of the rhetoric of contracting out and purchasing services from the charitable sector involves the idea of innovation – we tell ourselves that we’ll expand the most successful projects and strategies and scale back those that don’t work.
    • But mysteriously, we’ve been saying this for at least two decades and it’s remarkable how little of this actually takes place.
    • I think I have the beginnings of a quite powerful explanation for why that is – there’s a catch 22 at the heart of this learning system that those running it don’t really acknowledge even to themselves. I document that here.
    • Further, why are we so bad at learning from what works out in the field – with or without ‘what works centres’? We think of accountability as essentially a top-down activity – imposed by those above on those below.  (Or alternatively ‘accredited’ by our researchers using the tools of their trade – CBAs and RCTs and then propagated into practice by tools of ‘translation’ such as What Works Centres). But if this is really how it should be done, why are full-blown CBAs and RCTs so rarely used in business (as opposed to much lighter-weight experimentation and measurement with things like A/B testing)?
    • The system must build accountability to the facts and possibilities revealing themselves in the field. But that knowledge can’t travel upwards in the hierarchy while the system is engaging in accountability theatre and those above are holding those below ‘accountable’. How can they know what those below should be improving, what innovations will be most promising to try if they do not understand the conditions in the field and if those in the field may be penalised on the basis of information they pass up the line. In these circumstances, candour about what is and is not working is replaced by the whiteout of accountability theatre.
    • A proper accountability system needs to:
      • be focused not just on measuring the system, but principally measuring it with a view to learning and improving it.
      • learn from the field (or the bottom of the hierarchy) where most of the existing knowledge will be and most of the learning needs to take place,
      • have that learning objectively validated, so that ‘experts’ and the domain knowledge on which they draw remain accountable to the emerging evidence,
      • have that validated learning given appropriate weight against senior managers responding to institutional imperatives. For it is in this step that what I call ‘accountability theatre’ actively displaces true accountability for understanding what’s going on.
    • Believe it or not, this is what Toyota achieved in its development of a new way of managing car manufacture. It did so by spending literally ten times the industry standard amount on employee training, training shop floor workers to understand and manage the CNC (Computer numerical control) machine tools and then building the company’s accountability for its own productivity on the self-accountability of shop floor teams.
    • My own proposal for an Evaluator General tries to build the same system for the more complex, and ‘social’ world of delivering services to improve social wellbeing. It’s based on
      • Structurally separating doing things from knowing how well they’re performing. This occurs at the agency level within government where an agency like the statistical office will measure inflation and unemployment independently of politicians or the agencies whose performance will be measured by reference to those numbers.
      • Seeking to do this not just agency by agency, but in principle anywhere an agency works all the way down to delivery in the field.
      • Seeking to build close cooperation as well as structural independence between knowing and doing and from that
      • A system of evidence-based professional knowledge and accountability built from self-accountability in the field with learning built from that. (As the great scientist Richard Feynman put it “the first rule of science is that you must not fool yourself, and you are the easiest person to fool”.

  1. Added in June, 2022[]
This entry was posted in Economics and public policy, Ethics, Isegoria. Bookmark the permalink.
Notify of

Newest Most Voted
Inline Feedbacks
View all comments
paul frijters
paul frijters
4 years ago

Hi Nick,

I think about very similar things now, so am interested in “getting” what you mean.

Some bits I get, some I don’t. You are being very abstract here, which makes it tough going for those who think in a different way. Standard economists will thus totally miss the concept of motivation and, implicitly, power. You seem to say things about both, such as when you talk about self-accountability. You seem to say that public servants are more motivated when they are more autonomous rather than told top-down what to do. Is that what you mean?

If it is, an unfortunate reality in the UK has been that top-down systems have increased massively the last 20 years, even in private organisations, so in that sense things are moving in an opposite direction. Same in Australia. Hierarchy has been on a roll. More institutional imperatives and more performances.


I think I get the reality of accountability theater and the institutional imperative. I see a lot of it :-) And I know how a system could work where the workers are far better trained and responsible. In a broad sense what you sketch is how I think the Dutch public sector works already. This is difficult for you to refute as you have not lived there :-) Why don’t you though? Go live in an egalitarian place and see if it works how you imagine the Evaluator General system to work.

However, there are bits of the writing above I just have no idea what you mean. I really don’t get the significance of your separation between knowing and doing. Don’t we know by doing? And doesn’t everyone change what they do because of what they know and observe? It just seems such a strange thing to propose to seperate, like dividing people into two. Heads to the left, hands to the right.

I also really don’t get the Toyota reference. I have studied the Asian collectivist systems for years, which includes Japan, and they are not without institutional imperatives at all. Or without their own forms of theater. In many ways, they are worse than the Anglo-Saxons in terms of hierarchy. True, they do also have a community spirit which can be coopted for productive use, a bit like having teams operate as small villages, which is how I understand the Toyota system, but that is a very culturally-specific model that you cannot just advocate in other countries. The Americans could never work like that.

So how much do you know about Japan? Have you lived and worked there?

paul frijters
paul frijters
4 years ago
Reply to  Nicholas Gruen

yes, we should skype-chat about this, its interesting.

Some reflections. I am coming more to the view that useful episteme is just another form of phronesis. That’s also how I now think about embodied cognition and that kind of jazz, ie good theory really just is a practical tool, simply for another type of problem.

However, that’s not the main thing. Let me do two things here. One is give you the key reference on the hierarchy thing. The second is explain a bit more about the way the Northern Europeans think because it really goes to this whole buy-in, belonging, etc. stuff.

On Hierarchy, I take my cue from the 2017 Skills and Employment Survey here in the UK. Its a recurring survey that has measured similar things for decades sampling British Workers. The key measure is “discretion as to how you do your job”. The proportion of workers who have a lot of discretion over how they do their jobs has declined to 38 per cent in 2017, down from 62 per cent in 1992.

That really is an enormous drop: from 62% (the majority) to 38%.
There is also other, corroborating, data, but this survey is the strongest evidence I have seen. Additional clues are that workplaces and organisations in general have gotten bigger in the UK (though there has been a recent growth in self-employment, which is often hard to categorise as it is often a hidden from of employment for a very large company, like uber). Income and wealth inequality (rewards) has also grown, which I see as one of the results of greater hierarchy.

Then how the Northern Europeans really work, and I mean the Dutch, Danes, and Scandinavians, less so the Germans.

Firstly, the notion of choice and bosses is very different. Many workplaces in those countries do not have bosses in the way Anglo-Saxon workplaces have bosses. If the CEO throws her weight around too much, she will just be ignored and quickly deposed by a coalition of other managers, boards, unions, worker-collectives, or, if necessary, by politicians. The power of those up higher is far lower and whilst they have more say over how the organisation is run, there is not that sense that they own and run the place which you have here in the UK. So it is not up to bosses to decide to be more “information rich”, “better trained”, “getting buy-in from workers”. Even the notion that that is a choice bosses can make is a strange one in those countries.

Rather, one is taught from a very young age to slot in with others in cooperative teams, whereby one buys into the goal of that team. Later in life, one thus adopts various goals depending on the jobs that come round. If a Dane becomes a journalist, he becomes passionate about independent news and the ability of the whole journalist staff at his outfit to produce a good picture of what is going on in the world or some area of that world. If a Swede becomes an engineer working on a bridge project, she becomes really invested in the idea of a good bridge and will openly work towards that with her 20 colleagues doing various other aspects of that whole project, slotting in where she herself sees the best role for herself. She will also do lots of small jobs and initiatives that help her colleagues, often without those colleagues ever seeing that she helped them out in an unseen way.

So these Northern Europeans really buy into the goal of their team and organisation. They dont just think they co-own that goal, but they even think they are the organisation. That’s why there is not such a thing as a real boss: like a village can have a major who does not own the village, an organisation in Northern Europe often would not suffer an Anglo-style boss because that kind of arrogance and presumption of being in charge would not be tolerated.

This type of working together is somewhat similar to how you sketch Toyota to work. I am happy to be proven wrong about the Americans not being able to work that way, though I am surprised I must say, precisely because Toyota bosses make far less money than American bosses typically do. The pressure to become a “regular American firm” must be huge in those Toyota-plants, I imagine. How on earth have they kept that tendency out?

Now, to be clear, Northern Europe is not quite as idyllic as sketched and there are more top-down organisations there too. They grew a lot in the 1990s and 2000s, though I think (but am not totally sure) the Anglo-model is receding a bit there. The whole labour market preparation and general culture is certainly still of the type I sketch: people are raised to be co-owners of the organisations they work in, not merely buying into its goals, but even co-owning the process via which goals come about.

Within that kind of culture, use of data is much more integrated. One cannot so easily cheat because other co-workers would notice the cheating. The whole business of a conscience also goes towards this: one regulates oneself to a large degree.

I think you would it fascinating to see it in action.

Paul Frijters
4 years ago
Reply to  Nicholas Gruen

I remember seeing a graph with the changes in autonomy, and these figures I quote have been in various magazines. However, the whole thing is on a government website:

I am trying to think of how this Evaluator General could work in the Netherlands. It certainly wouldnt work with that title, indeed. But some kind of training-providing institute plus collector of examples and methods might work. Maybe they already have those things in various sectors. I wouldnt be surprised.

The covid-19 crisis is a good example of what the Dutch population does: the smart ones start to figure out how it all works and after a month have overtaken the medical “specialists” and then start to push those supposed specialists out of the public media.

This the best-known of these quick thinkers who is now already recognised by many in the population as far ahead of those public sector medical specialists:

(most is of course in Dutch, but this guy made the effort of putting some stuff into English. He runs a polling organisation, go figure).

paul frijters
paul frijters
4 years ago
Reply to  Nicholas Gruen

sure, there have been cover-ups in the Netherlands, but usually about things the country is embarrassed about, not just a minister. Like civilian deaths in Iraq.

However, the point I was making about the training is that that is how I think your proposal would be interpreted because many of the others aspects are already so normal there. Part of the education trajectory already.

You want far more than just a habit of measuring and reflection, Nick. You are proposing a whole culture of how to work. The great advantage of being able to say “I want you all to adopt the Danish model of working” is that you can then point to many institutions and habits in Denmark, and you have 10 million people who can tell others about how it’s done. Even if what you are sketching is an idyllic notion of the Danes, that still gives you a natural place to point to for inspiration on how it could all fit together. The problem with calling it the Gruen Method is that you are then the only source of inspiration about how to do it.

What you sketch does sound very egalitarian in situ science to me. The Scandi -science package? The Riks-model? The Viking system? Or just “in situ science”? Sounds poncy enough, no?

Daniel Work
Daniel Work
4 years ago

Hi Nicholas
I just watched the whole video from this post, and I came away thinking the theory and proposed practices could benefit from what is going on in the DevOps\Agile software communities. Specifically the work of:
Prof David Snowden (Cynefin), problem solving in complex adaptive systems (like your child protection scenario).
Simon Wardley (Wardley Mapping), takes the value chain mapping from lean to the next level by integrating principals from evolutionary biology.