Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets

09/27/2017
by   Rotem Dror, et al.
0

With the ever-growing amounts of textual data from a large variety of languages, domains, and genres, it has become standard to evaluate NLP algorithms on multiple datasets in order to ensure consistent performance across heterogeneous setups. However, such multiple comparisons pose significant challenges to traditional statistical analysis methods in NLP and can lead to erroneous conclusions. In this paper, we propose a Replicability Analysis framework for a statistically sound analysis of multiple comparisons between algorithms for NLP tasks. We discuss the theoretical advantages of this framework over the current, statistically unjustified, practice in the NLP literature, and demonstrate its empirical value across four applications: multi-domain dependency parsing, multilingual POS tagging, cross-domain sentiment classification and word similarity prediction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2016

Challenges of Computational Processing of Code-Switching

This paper addresses challenges of Natural Language Processing (NLP) on ...
research
09/07/2018

Multitask and Multilingual Modelling for Lexical Analysis

In Natural Language Processing (NLP), one traditionally considers a sing...
research
11/15/2022

When to Use What: An In-Depth Comparative Empirical Analysis of OpenIE Systems for Downstream Applications

Open Information Extraction (OpenIE) has been used in the pipelines of v...
research
03/26/2019

SciBERT: Pretrained Contextualized Embeddings for Scientific Text

Obtaining large-scale annotated data for NLP tasks in the scientific dom...
research
12/05/2020

Codeswitched Sentence Creation using Dependency Parsing

Codeswitching has become one of the most common occurrences across multi...
research
01/08/2022

Beyond modeling: NLP Pipeline for efficient environmental policy analysis

As we enter the UN Decade on Ecosystem Restoration, creating effective i...
research
09/25/2022

Corpus-based Metaphor Analysis through Graph Theoretical Methods

As a contribution to metaphor analysis, we introduce a statistical, data...

Please sign up or login with your details

Forgot password? Click here to reset