The power of statistics

About three decades ago statistics a la USA arrived in rugby league. This had some interesting effects. One year (1976?) there was an epic battle between Ray Higgs (Parra)   and Terry Randall (Manly) for a big prize for the leading tackler in the comp. Then someone realised that the club statisticians were putting their thumbs on the scale to inflate the count for their man, so the competition was called off.

Despite the rude remarks that I have made about econometrics, I appreciate the power of statistics, properly used. Regression analysis is  a tool and it is no better than the questions that are asked when it is used to extract answers.

This book about baseball appears to be  a fantastic illustration of the proper way to use statistics  (but check out Tony T’s review, comment #1 for balance).  It seems that baseball until very recently was full of myths that persisted despite the highly visible nature of the game and the results, the size of the stakes and the intelligence and experience of the players, coaches, commentators and everyone else involved in playing and watching the game.

It is the story of Billy Bean, a star recruit who never actually made it in the big league, then  became a highly effective  administrator.

Beane was a much better baseball analyst than baseball player, and he quickly moved up the Oakland club’s hierarchy. He became interested in a simple question: what is the most efficient way to spend money on baseball players? The origins of Beane’s iconoclastic answers can be found in the writings of Bill James, a once obscure but now legendary baseball writer-statistician. While working as a night watchman for a pork-and-beans factory, James decided that he wanted to write about baseball in a way that would illuminate what really happened and why. In his view, conventional statistics were insufficiently helpful and sometimes downright misleading. Consider the area of defensive play. When a player mishandles a ball or makes a bad throw, he can be assigned an “error.” A player who accumulates a lot of errors seems like a bad fielder, whereas one with few errors seems really good. The problem is that a player may accumulate errors in part because he is unusually good at getting to the ball. If you do not get to the ball, you do not get an error (according to the chapter on scoring in The Book). So errors are a crude measure of fielding ability.

Or consider walks. Since the late nineteenth century, walks have been treated, in official statistics, as neutral–neither good nor bad. According to a nineteenthcentury expert whose advice is followed to the present day, “There is but one true criterion of skill at the bat, and that is the number of times bases are made on clean hits.” Of course, many people realized that a walk is a positive event for the hitting team and a negative event for the team in the field, but this commonsense notion was not incorporated into baseball’s most common measure of batting skill, the batting average, which leaves walks out.

The statistical method was the only way for Beane to solve a serious problem: obtaining first-rate talent without a lot of money. After all, the New York Yankees had three times the budget of the Oakland Athletics. And if Beane did find good players, and they performed well, they would be bid away by richer teams. Owing to his low payroll, he would be forced to replace his own greatest successes. In 2001, Oakland won 102 games in the regular season, the second-highest total in baseball. They lost three players widely regarded as their best, and they were expected by many to have a catastrophic fall. Instead they used statistical methods to try to replace the lost players with new ones who would provide statistical equivalents–and they ended up winning 103 games, the most in baseball. Their payroll for that year was $34 million, less than half that of their division rivals the Seattle Mariners. In Lewis’s account, Beane was able to succeed because “the market for baseball players was so inefficient, and the general grasp of sound baseball strategy so weak, that superior management could still run circles around taller piles of cash.”

There is an even larger puzzle. Why didn’t someone like Beane come along sooner? Why didn’t baseball executives start using statistics a decade, or two decades, or three decades, earlier? Why have falsehoods and mistakes persisted? The economic stakes are extremely high, after all, and if Lewis is correct, the management of most baseball teams could have saved many millions of dollars simply by making more rational personnel decisions. Nor was the important information hard to find. James’s arguments have been around for nearly two decades. In a market as competitive as major league baseball, surely the information should have been used, and fast. What went wrong?

The problem is not that baseball professionals are stupid; it is that they are human. Like most people, including experts, they tend to rely on simple rules of thumb, on traditions, on habits, on what other experts seem to believe. Even when the stakes are high, rational behavior does not always emerge. It takes time and effort to switch from simple intuitions to careful assessments of evidence.

This entry was posted in Uncategorised. Bookmark the permalink.
Notify of
Newest Most Voted
Inline Feedbacks
View all comments
15 years ago

It’s a good book alright, if a little overpraised. My review.

Chris Lloyd
Chris Lloyd
15 years ago

Sports and gambling are a great vehicles for introducing statistics to undergrads, see for instance here . There is even an academic centre academic centre devoted to sports statistics research.

One of the reasons the statistics of sporting performance is interesting is that it is often very tough to predict. This doesn’t stop people trying hard to “explain”

15 years ago

Generally, sport stats are an irrelevance to measuring the quality of performance in team ball sports. A coach who relies on them doesn’t understand and appreciate the game or its nuances. This definitely applies to soccer and I would expect to rugby and AFL. Baseball may, however, be an exception that proves the rule.

Steve Edney
15 years ago

Stats work much better as an analysis tool when you have situations of clear individual contests, which are rare in many team balls sports. Batting and pitching in baseball I generate statistics with good relevance to the players abilities, in the same way that batting and bowling averages are useful measures in cricket. You have (mostly) a fairly clear individual contest, and the relative ineptitude or brilliance of your team matters little.

In contrast stats on goals or trys scored and such depends very much on the team despite the individual’s brilliance.

15 years ago

Steve, I think it depends on the context and purpose. Stats are necessarily ex post. So perhaps useful as a rough guide to the worth of a career. And probably better indicators of performance in baseball, cricket, as you suggest, but not necessarily ability.

But as a means of spotting raw talent, nearly worthless I’d hazard. The good coach is able to spot talent, perhaps in the midst of woeful performance, and then nurture and develop it as well as melding it with other players into a team. The “stats” will then surely follow.

Increasingly in soccer and AFL coaches, journalists and analysts want to break the game down into stats. For me, this is just pseudo-science. It focusses on effects rather than causes of the play in a game.

In my opinion, American “sports” commentators are the worst abusers of stats. And the utter crap talked by their golf commentators is stupefying.

Chris Lloyd
Chris Lloyd
15 years ago

A few years ago I had a great idea (I get one every few years). Bradman’s batting average is 99.94. Now think about how a batting average is calculated. If you are not out, then your score is added to the numerator but nothing is added to the denominator. This is equivalent to assuming you would go on to make your average score additional to your not out score. For example, if you add 99.94 to all of Bradman’s not out scores and assume he got out at that score then you get the same average of 99.94.

This assumption has to be wrong! My great idea was that if I can calculate Bradman’s average under a “better”