How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition

Data competitions rely on real-time leaderboards to rank competitor entries and stimulate algorithm improvement. While such competitions have become quite popular and prevalent, particularly in supervised learning formats, their implementations by the host are highly variable. Without careful planning, a supervised learning competition is vulnerable to overfitting, where the winning solutions are so closely tuned to the particular set of provided data that they cannot generalize to the underlying problem of interest to the host. This paper outlines some important considerations for strategically designing relevant and informative data sets to maximize the learning outcome from hosting a competition based on our experience. It also describes a post-competition analysis that enables robust and efficient assessment of the strengths and weaknesses of solutions from different competitors, as well as greater understanding of the regions of the input space that are well-solved. The post-competition analysis, which complements the leaderboard, uses exploratory data analysis and generalized linear models (GLMs). The GLMs not only expand the range of results we can explore, they also provide more detailed analysis of individual sub-questions including similarities and differences between algorithms across different types of scenarios, universally easy or hard regions of the input space, and different learning objectives. When coupled with a strategically planned data generation approach, the methods provide richer and more informative summaries to enhance the interpretation of results beyond just the rankings on the leaderboard. The methods are illustrated with a recently completed competition to evaluate algorithms capable of detecting, identifying, and locating radioactive materials in an urban environment.

READ FULL TEXT
research
05/13/2021

Global Wheat Challenge 2020: Analysis of the competition design and winning models

Data competitions have become a popular approach to crowdsource new data...
research
03/12/2019

AutoML @ NeurIPS 2018 challenge: Design and Results

We organized a competition on Autonomous Lifelong Machine Learning with ...
research
08/09/2023

Competitions in AI – Robustly Ranking Solvers Using Statistical Resampling

Solver competitions play a prominent role in assessing and advancing the...
research
06/29/2011

The 3rd International Planning Competition: Results and Analysis

This paper reports the outcome of the third in the series of biennial in...
research
03/04/2021

Exploring the representativeness of the M5 competition data

The main objective of the M5 competition, which focused on forecasting t...
research
06/25/2022

CV 3315 Is All You Need : Semantic Segmentation Competition

This competition focus on Urban-Sense Segmentation based on the vehicle ...
research
05/27/2011

Efficient Implementation of the Plan Graph in STAN

STAN is a Graphplan-based planner, so-called because it uses a variety o...

Please sign up or login with your details

Forgot password? Click here to reset