Dissecting Contextual Word Embeddings: Architecture and Representation

08/27/2018
by   Matthew E. Peters, et al.
0

Contextual word representations derived from pre-trained bidirectional language models (biLMs) have recently been shown to provide significant improvements to the state of the art for a wide range of NLP tasks. However, many questions remain as to how and why these models are so effective. In this paper, we present a detailed empirical study of how the choice of neural architecture (e.g. LSTM, CNN, or self attention) influences both end task accuracy and qualitative properties of the representations that are learned. We show there is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks. Additionally, all architectures learn representations that vary with network depth, from exclusively morphological based at the word embedding layer through local syntax based in the lower contextual layers to longer range semantics such coreference at the upper layers. Together, these results suggest that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.

READ FULL TEXT

page 5

page 13

page 14

page 15

research
09/02/2019

How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings

Replacing static word embeddings with contextualized word representation...
research
04/07/2022

PALBERT: Teaching ALBERT to Ponder

Currently, pre-trained models can be considered the default choice for a...
research
09/09/2019

Reverse Transfer Learning: Can Word Embeddings Trained for Different NLP Tasks Improve Neural Language Models?

Natural language processing (NLP) tasks tend to suffer from a paucity of...
research
04/22/2021

Low Anisotropy Sense Retrofitting (LASeR) : Towards Isotropic and Sense Enriched Representations

Contextual word representation models have shown massive improvements on...
research
12/27/2021

Understanding RoBERTa's Mood: The Role of Contextual-Embeddings as User-Representations for Depression Prediction

Many works in natural language processing have shown connections between...
research
08/06/2023

3D-EX : A Unified Dataset of Definitions and Dictionary Examples

Definitions are a fundamental building block in lexicography, linguistic...
research
06/08/2021

Obtaining Better Static Word Embeddings Using Contextual Embedding Models

The advent of contextual word embeddings – representations of words whic...

Please sign up or login with your details

Forgot password? Click here to reset