A Boo(n) for Evaluating Architecture Performance

07/05/2018
by   Ondrej Bajgar, et al.
0

We point out important problems with the common practice of using the best single model performance for comparing deep learning architectures, and we propose a method that corrects these flaws. Each time a model is trained, one gets a different result due to random factors in the training process, which include random parameter initialization and random data shuffling. Reporting the best single model performance does not appropriately address this stochasticity. We propose a normalized expected best-out-of-n performance (Boo_n) as a way to correct these problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2021

A Topological-Framework to Improve Analysis of Machine Learning Model Performance

As both machine learning models and the datasets on which they are evalu...
research
06/05/2019

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

Residual networks (ResNet) and weight normalization play an important ro...
research
05/09/2023

What is the best recipe for character-level encoder-only modelling?

This paper aims to benchmark recent progress in language understanding m...
research
08/06/2020

A critical analysis of metrics used for measuring progress in artificial intelligence

Comparing model performances on benchmark datasets is an integral part o...
research
11/08/2019

Ruminating Word Representations with Random Noised Masker

We introduce a training method for both better word representation and p...
research
10/21/2022

Random Actions vs Random Policies: Bootstrapping Model-Based Direct Policy Search

This paper studies the impact of the initial data gathering method on th...
research
10/21/2021

Data splitting improves statistical performance in overparametrized regimes

While large training datasets generally offer improvement in model perfo...

Please sign up or login with your details

Forgot password? Click here to reset