BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance

10/16/2019
by   Timo Schick, et al.
0

Pretraining deep contextualized representations using an unsupervised language modeling objective has led to large performance gains for a variety of NLP tasks. Notwithstanding their enormous success, recent work by Schick and Schütze (2019) suggests that these architectures struggle to understand many rare words. For context-independent word embeddings, this problem can be addressed by explicitly relearning representations for infrequent words. In this work, we show that the very same idea can also be applied to contextualized models and clearly improves their downstream task performance. As previous approaches for relearning word embeddings are commonly based on fairly simple bag-of-words models, they are no suitable counterpart for complex language models based on deep neural networks. To overcome this problem, we introduce BERTRAM, a powerful architecture that is based on a pretrained BERT language model and capable of inferring high-quality representations for rare words through a deep interconnection of their surface form and the contexts in which they occur. Both on a rare word probing task and on three downstream task datasets, BERTRAM considerably improves representations for rare and medium frequency words compared to both a standalone BERT model and previous work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2019

Attentive Mimicking: Better Word Embeddings by Attending to Informative Contexts

Learning high-quality embeddings for rare words is a hard problem becaus...
research
04/14/2019

Rare Words: A Major Problem for Contextualized Embeddings And How to Fix it by Attentive Mimicking

Pretraining deep neural network architectures with a language modeling o...
research
05/25/2023

Not wacky vs. definitely wacky: A study of scalar adverbs in pretrained language models

Vector space models of word meaning all share the assumption that words ...
research
04/25/2020

Quantifying the Contextualization of Word Representations with Semantic Class Probing

Pretrained language models have achieved a new state of the art on many ...
research
08/31/2021

Effectiveness of Deep Networks in NLP using BiDAF as an example architecture

Question Answering with NLP has progressed through the evolution of adva...
research
02/24/2017

Use Generalized Representations, But Do Not Forget Surface Features

Only a year ago, all state-of-the-art coreference resolvers were using a...
research
03/21/2022

Better Language Model with Hypernym Class Prediction

Class-based language models (LMs) have been long devised to address cont...

Please sign up or login with your details

Forgot password? Click here to reset