Data-mining start-up Kaggle launches $US3.3m competition

Australian data-mining start-up Kaggle is aiming to generate international interest after launching a $US3.3 million online data analysis competition, attracting contestants from around the world.

 

Kaggle, which began operating 11 months ago, provides statistical and analytical outsourcing, describing itself as the leading platform for data modelling and prediction competitions.

 

The business was co-founded by economic modeller Anthony Goldbloom, now chief executive, and former McKinsey consultant Jeremy Howard, Kaggle’s chief data scientist.

 

The company’s third shareholder and chairman is Nicholas Gruen, head of Lateral Economics and a former member of the Productivity Commission.

 

Kaggle is a virtual company, with its few employees located in Australia and the United States, although it is planning to relocate to Silicon Valley.

 

To date, Kaggle has run 16 data prediction competitions, making money on licensing fees and consulting fees for the contests. The competitions range from predicting the progression of HIV to forecasting travel times on freeways.

 

The logic behind data mining is that by opening up a problem to millions of people on social networks, it will achieve a better result than could be achieved through traditional research.

 

Kaggle’s community of data scientists comprises thousands of PhDs from quantitative fields such as computer science, statistics, econometrics, maths and physics.

 

In addition to the prize money and data, they use Kaggle to meet, network and collaborate with experts from related fields.

 

“The idea is to make data science more of a meritocracy. We want to make it easier for people who are really good to demonstrate their abilities,” Goldbloom says.

 

Kaggle’s latest competition, the Heritage Health Prize, aims to solve a health data problem. It is a two-year long competition with a grand prize of $3 million for the winner.

 

Using predictive data modelling, contestants will examine three years of historical medical data from anonymous real-life people.

 

The challenge is to create an algorithm that will predict how many days each person will spend in the hospital in the one year after those three years of data.

 

The winner will be the contestant that is closest on average to the actual number of hospital days for each patient. The outcome will be used exclusively by Heritage to manage its operations.

 

The goal is to create an “early warning system” for managed care providers and provide a better way to identify which patients need care immediately to improve their health.

 

“The idea is that managed care providers responsible for a patient’s care can have a flag show up when a patient is deemed to be at a certain risk of hospitalisation,” Goldbloom says.

 

According to Jeremy Howard, the company’s first goal was to prove to itself that it could build a social network of the world’s best data scientists, who could compete to solve a problem.

 

“Now with this $3 million competition, far more people will join in… It is a bit like mathematical modelling as sport because you can race against others,” he says.

COMMENTS