Rare Words: A Major Problem for Contextualized Embeddings And How to Fix it by Attentive Mimicking

04/14/2019
by   Timo Schick, et al.
0

Pretraining deep neural network architectures with a language modeling objective has brought large improvements for many natural language processing tasks. Exemplified by BERT, a recently proposed such architecture, we demonstrate that despite being trained on huge amounts of data, deep language models still struggle to understand rare words. To fix this problem, we adapt Attentive Mimicking, a method that was designed to explicitly learn embeddings for rare words, to deep language models. In order to make this possible, we introduce one-token approximation, a method that allows us to use Attentive Mimicking even when the underlying language model uses subword-based tokenization, i.e., it does not assign embeddings to all words. To evaluate our method, we create a novel dataset that tests the ability of language models to capture semantic properties of words without any task-specific fine-tuning. Using this dataset, we show that adding our adapted version of Attentive Mimicking to BERT does indeed substantially improve its understanding of rare words.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2019

BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance

Pretraining deep contextualized representations using an unsupervised la...
research
03/21/2022

Better Language Model with Hypernym Class Prediction

Class-based language models (LMs) have been long devised to address cont...
research
09/29/2020

Improving Low Compute Language Modeling with In-Domain Embedding Initialisation

Many NLP applications, such as biomedical data and technical support, ha...
research
03/18/2018

Rare Feature Selection in High Dimensions

It is common in modern prediction problems for many predictor variables ...
research
11/07/2022

Probing neural language models for understanding of words of estimative probability

Words of estimative probability (WEP) are expressions of a statement's p...
research
11/21/2019

Automatic Text-based Personality Recognition on Monologues and Multiparty Dialogues Using Attentive Networks and Contextual Embeddings

Previous works related to automatic personality recognition focus on usi...
research
04/02/2019

Attentive Mimicking: Better Word Embeddings by Attending to Informative Contexts

Learning high-quality embeddings for rare words is a hard problem becaus...

Please sign up or login with your details

Forgot password? Click here to reset