Web 2.0 is proving very adept at finding needles in haystacks that we couldn’t have found before. Netflix is a company which rents videos and which relies on the ability of its algorithm to predict what movies you’re going to like from the ranking you’ve given past movies. Given its significance for their profitability they wanted to access the world’s best statisticians to improve their algorithm. But how do you do that. There are two problems – firstly the obvious one that in any imaginable world Netflix will be able to hire less than 0.1% of the world’s statisticians and the slightly less obvious one – that it won’t be easy for those in Netflix to work out who’s a good at their statistics and who is not.
So what better way to do it than host a competition on the internet?
And so a team has just won one million dollars for improving their suggestion algorithm by 10% – well in fact it was a team of teams as on their own, none of the teams could do it. The team that won contains some very seasoned statisticians, and some rank amateurs who nevertheless pulled their weight (I deduce from their membership of the team of teams).
Innocentive has established a Web 2.0 market for ideas in which ‘seekers’ post challenges (like how to get more toothpaste out of the tube), and promise prizes. Challenges that are on their website as I write this include the production of an Open and Re-closable Fastening System for clothes (other than the known methods like Velcro) and something that will help Reduce the Placebo Effect in Clinical Drug Trials.
Anthony Goldbloom, an econometrician working in the Treasury and then the Reserve Bank watched all this and decided to establish Kaggle.com which is a data-analysis and prediction marketplace where companies can run competitions like the Netflix prize. If you read about the Netflix prize you find that doing it well was quite an involved process – one that other firms might well want to contract out. Indeed, it turns out that their second offering of a major prize has had to be cancelled owing to privacy concerns. So expertise is needed to run these things well. And as is the case with e-Bay and Innocentive and plenty of other internet marketplaces, there are strong economies of scale and scope which means that there will be benefits in pooling resources in marketplaces. So I not only predict a bright future in Kaggle, I’ve become its Chairman (I think this qualifies as a disclosure of interest).
There are meanwhile lots of things to think about. Both Anthony and I have some hefty ambitions about what the site could become. We’re both keen on the way in which people can build reputations though Kaggle for knowing what they’re doing – reputations that are often not nearly as accurately formed within organisations for reasons discussed above.
There are also many different kinds of ways to run competitions and kinds of things one might want out of it. One might want to predict the future – by working out the odds of Collingwood winning next week and for such an exercise one would then require the elapse of some time before the prediction could come in. The Netflix prize is a little different in the sense that the prize can be given out when relationships are identified in the data that stand up to scrutiny given existing data. In such a circumstance one is really ‘predicting the past’.
We’ve just launched our first predicting-the-future competitition, which involves forecasting the voting for the 2010 Eurovision Song Contest. For those not familiar with the contest, it’s widely believed that voting outcomes are influenced by European politics. Contestants in Kaggle’s Forecast Eurovision Voting will attempt to exploit historical voting regularities, as well as other factors, to predict the voting for the 2010 Eurovision Song Contest. The winner of the Kaggle contest will collect a $USD1,000 cash prize. And just as the Eurovision Song Contest has launched the high-flying careers of ABBA, Celine Dion and Riverdance, forecasting accurately will earn competitors a top ranking on Kaggle’s league table.
A taste of the power of Kaggle is captured in its first, competition. Even without prize money, the demo footy tipping comp has attracted 158 legitimate entries from 7 teams, and the leader would have tipped 74 per cent of games correctly (76 per cent is required to win the rich Sportingbet tipping comp). What’s more this is based on a data set that was pretty much thrown together with the central criterion being that was easy to collect.