The Importance of Context in Very Low Resource Language Modeling

05/10/2022
by   Lukas Edman, et al.
0

This paper investigates very low resource language model pretraining, when less than 100 thousand sentences are available. We find that, in very low resource scenarios, statistical n-gram language models outperform state-of-the-art neural models. Our experiments show that this is mainly due to the focus of the former on a local context. As such, we introduce three methods to improve a neural model's performance in the low-resource setting, finding that limiting the model's self-attention is the most effective one, improving on downstream tasks such as NLI and POS tagging by up to 5 we test on: English, Hindi, and Turkish.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/23/2022

MicroBERT: Effective Training of Low-resource Monolingual BERTs through Parameter Reduction and Multitask Learning

Transformer language models (TLMs) are critical for most NLP tasks, but ...
research
06/30/2019

Evaluating Language Model Finetuning Techniques for Low-resource Languages

Unlike mainstream languages (such as English and French), low-resource l...
research
10/06/2015

Language Segmentation

Language segmentation consists in finding the boundaries where one langu...
research
11/29/2021

PSG: Prompt-based Sequence Generation for Acronym Extraction

Acronym extraction aims to find acronyms (i.e., short-forms) and their m...
research
06/24/2023

Evaluating the Utility of GAN Generated Synthetic Tabular Data for Class Balancing and Low Resource Settings

The present study aimed to address the issue of imbalanced data in class...
research
03/25/2023

Sem4SAP: Synonymous Expression Mining From Open Knowledge Graph For Language Model Synonym-Aware Pretraining

The model's ability to understand synonymous expression is crucial in ma...
research
12/30/2020

DEER: A Data Efficient Language Model for Event Temporal Reasoning

Pretrained language models (LMs) such as BERT, RoBERTa, and ELECTRA are ...

Please sign up or login with your details

Forgot password? Click here to reset