Mind Your Inflections! Improving NLP for Non-Standard English with Base-Inflection Encoding

04/30/2020
by   Samson Tan, et al.
0

Morphological inflection is a process of word formation where base words are modified to express different grammatical categories such as tense, case, voice, person, or number. World Englishes, such as Colloquial Singapore English (CSE) and African American Vernacular English (AAVE), differ from Standard English dialects in inflection use. Although comprehension by human readers is usually unimpaired by non-standard inflection use, NLP systems are not so robust. We introduce a new Base-Inflection Encoding of English text that is achieved by combining linguistic and statistical techniques. Fine-tuning pre-trained NLP models for downstream tasks under this novel encoding achieves robustness to non-standard inflection use while maintaining performance on Standard English examples. Models using this encoding also generalize better to non-standard dialects without explicit training. We suggest metrics to evaluate tokenizers and extensive model-independent analyses demonstrate the efficacy of the encoding when used together with data-driven subword tokenizers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2020

It's Morphin' Time! Combating Linguistic Discrimination with Inflectional Perturbations

Training on only perfect Standard English corpora predisposes pre-traine...
research
04/07/2018

Simple Models for Word Formation in English Slang

We propose generative models for three types of extra-grammatical word f...
research
03/14/2022

FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing

We present a benchmark suite of four datasets for evaluating the fairnes...
research
06/22/2022

reStructured Pre-training

In this work, we try to decipher the internal connection of NLP technolo...
research
02/24/2020

Parsing Early Modern English for Linguistic Search

We investigate the question of whether advances in NLP over the last few...
research
09/26/2017

Learning to Explain Non-Standard English Words and Phrases

We describe a data-driven approach for automatically explaining new, non...
research
07/27/2023

Models of reference production: How do they withstand the test of time?

In recent years, many NLP studies have focused solely on performance imp...

Please sign up or login with your details

Forgot password? Click here to reset