Accounting for Variance in Machine Learning Benchmarks

03/01/2021
by   Xavier Bouthillier, et al.
23

Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.

READ FULL TEXT
research
06/28/2018

Automatic Exploration of Machine Learning Experiments on OpenML

Understanding the influence of hyperparameters on the performance of a m...
research
11/01/2017

Data, Depth, and Design: Learning Reliable Models for Melanoma Screening

State of the art on melanoma screening evolved rapidly in the last two y...
research
05/04/2020

Cost Effective Optimization for Cost-related Hyperparameters

The increasing demand for democratizing machine learning algorithms for ...
research
06/18/2020

DisARM: An Antithetic Gradient Estimator for Binary Latent Variables

Training models with discrete latent variables is challenging due to the...
research
08/30/2020

MementoML: Performance of selected machine learning algorithm configurations on OpenML100 datasets

Finding optimal hyperparameters for the machine learning algorithm can o...
research
10/11/2020

What causes the test error? Going beyond bias-variance via ANOVA

Modern machine learning methods are often overparametrized, allowing ada...
research
07/07/2021

"Are you sure?": Preliminary Insights from Scaling Product Comparisons to Multiple Shops

Large eCommerce players introduced comparison tables as a new type of re...

Please sign up or login with your details

Forgot password? Click here to reset