Towards Realistic Practices In Low-Resource Natural Language Processing: The Development Set

09/04/2019
by   Katharina Kann, et al.
0

Development sets are impractical to obtain for real low-resource languages, since using all available data for training is often more effective. However, development sets are widely used in research papers that purport to deal with low-resource natural language processing (NLP). Here, we aim to answer the following questions: Does using a development set for early stopping in the low-resource setting influence results as compared to a more realistic alternative, where the number of training epochs is tuned on development languages? And does it lead to overestimation or underestimation of performance? We repeat multiple experiments from recent work on neural models for low-resource NLP and compare results for models obtained by training with and without development sets. On average over languages, absolute accuracy differs by up to 1.4 as big as 18.0 experimental setups in the publication of low-resource NLP research results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

Current developments in natural language processing offer challenges and...
research
04/19/2023

A Survey of Corpora for Germanic Low-Resource Languages and Dialects

Despite much progress in recent years, the vast majority of work in natu...
research
06/12/2020

Low-resource Languages: A Review of Past Work and Future Challenges

A current problem in NLP is massaging and processing low-resource langua...
research
11/12/2021

Exploiting all samples in low-resource sentence classification: early stopping and initialization parameters

In low resource settings, deep neural models have often shown lower perf...
research
06/01/2022

What a Creole Wants, What a Creole Needs

In recent years, the natural language processing (NLP) community has giv...
research
08/11/2021

Ensuring the Inclusive Use of Natural Language Processing in the Global Response to COVID-19

Natural language processing (NLP) plays a significant role in tools for ...
research
11/03/2017

One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis

When learning a new skill, you take advantage of your preexisting skills...

Please sign up or login with your details

Forgot password? Click here to reset