As I travel the country preaching the great things about Web 2.0 it’s great to see a really interesting Web 2.0 app being launched from sunny Melbourne. Well actually I guess it was launched while its creator was living in Sydney but he’s just moved down to Melbourne where he and I had lunch the other day – we went on for a surprisingly long period of time.

I don’t know how many Troppodillians are aware of Innocentive – but it’s a site that brings together people with technical problems to solve and people to solve them. Kaggle is the Innocentive of data.  It will begin by hosting data competitions which will get that market for econometric skill humming along. I invited Anthony Goldbloom to send me a guest post for Troppo and he obliged with the post below.

Google’s chief economist, Hal Varian, says the statistician is the “sexy job of the next ten years”. Data are so central to modern organisations that to be competitive in many industries, companies must have good algorithms in their arsenal. A bank that accurately identifies creditworthy borrowers can lend at lower interest rates and a bookseller that makes insightful recommendations to its customers will engender loyalty. Data-crunching is also essential to non-profit organisations. Governments save millions of dollars by scanning tax records and health insurance claims for signs of fraud; political parties churn through masses of socio-economic data to pinpoint swing voters; and biologists collect and analyse gene sequences to better understand and predict disease.

My new project, Kaggle, helps organisations make better use of their data. It offers a platform for data-related competitions, allowing companies, researchers, governm
ent and other organisations to post their modelling problems and have data professionals and researchers compete to produce the best solutions. The Kaggle platform lowers the barriers to hosting such competitions by allowing organisations to host contests without having to build their own infrastructure. Best of all, it costs nothing to host a competition on Kaggle.

Competitions can be used to find solutions to difficult problems or to improve models that “can’t be improved”. They work on the premise that it is rarely the case that a single organisation has the best person to solve a given problem. Releasing a modelling task to the world at large gives the organisation the opportunity to tap a much wider talent pool, ensuring they have access to the best possible solutions. Competitions are also a useful interface between academia and industry. Organisations can post their problems and have researchers apply cutting edge methods in an attempt to find the best solution.

Using competitions to spur innovation is not a new idea. In 1714, the British Government offered a £20,000 reward for a method that could determine a ship’s longitude within 30 nautical miles (determining latitude was relatively easy; the lack of a good method for measuring longitude meant slow journeys and lost ships). And data-related competitions have already been used to good effect. Netflix, an American DVD rental service, offered USD $1m to the analyst (or team) who could improve their recommendations algorithm by 10 per cent. (The idea behind Netflix is that borrowers are asked to rate their DVDs. Netflix then makes recommendations to borrowers based on their past ratings and the ratings of other borrowers with similar taste). USD $1m may seem like a huge prize, but according to Netflix CEO Reed Hastings, an improvement of 10 per cent was worth “well in excess of $1m”. Of course it doesn’t take USD $1m to attract participants. In 2009, the French telecommunications company Orange held a competition in conjunction with the ACM SIGKDD Conference, offering 10,000-worth of prizes to contestants who could be predict which Orange customers were likely to switch providers, upgrade their plans or buy other Orange products. The contest attracted over 8,000 entries.

The Kaggle demo is available at demo is at demo.kaggle.com. The project will host its first competition in early March.

