Is the Best Better? Bayesian Statistical Model Comparison for Natural Language Processing

10/06/2020
by   Piotr Szymański, et al.
0

Recent work raises concerns about the use of standard splits to compare natural language processing models. We propose a Bayesian statistical model comparison technique which uses k-fold cross-validation across multiple data sets to estimate the likelihood that one model will outperform the other, or that the two will produce practically equivalent results. We use this technique to rank six English part-of-speech taggers across two data sets and three evaluation metrics.

READ FULL TEXT

page 8

page 9

page 10

research
04/20/2021

Problems and Countermeasures in Natural Language Processing Evaluation

Evaluation in natural language processing guides and promotes research o...
research
09/28/2016

Statistical comparison of classifiers through Bayesian hierarchical modelling

Usually one compares the accuracy of two competing classifiers via null ...
research
08/09/2022

A Bayesian Bradley-Terry model to compare multiple ML algorithms on multiple data sets

This paper proposes a Bayesian model to compare multiple algorithms on m...
research
09/29/2016

Classifier comparison using precision

New proposed models are often compared to state-of-the-art using statist...
research
03/03/2021

NeurIPS 2020 NLC2CMD Competition: Translating Natural Language to Bash Commands

The NLC2CMD Competition hosted at NeurIPS 2020 aimed to bring the power ...
research
01/27/2018

A Sheaf Model of Contradictions and Disagreements. Preliminary Report and Discussion

We introduce a new formal model -- based on the mathematical construct o...
research
04/22/2023

"I'm" Lost in Translation: Pronoun Missteps in Crowdsourced Data Sets

As virtual assistants continue to be taken up globally, there is an ever...

Please sign up or login with your details

Forgot password? Click here to reset