Contextual Embeddings: When Are They Worth It?

05/18/2020
by   Simran Arora, et al.
0

We study the settings for which deep contextual embeddings (e.g., BERT) give large improvements in performance relative to classic pretrained embeddings (e.g., GloVe), and an even simpler baseline—random word embeddings—focusing on the impact of the training set size and the linguistic properties of the task. Surprisingly, we find that both of these simpler baselines can match contextual embeddings on industry-scale data, and often perform within 5 to 10 accuracy (absolute) on benchmark tasks. Furthermore, we identify properties of data for which contextual embeddings give particularly large gains: language containing complex structure, ambiguous word usage, and words unseen in training.

READ FULL TEXT
research
06/17/2019

KaWAT: A Word Analogy Task Dataset for Indonesian

We introduced KaWAT (Kata Word Analogy Task), a new word analogy task da...
research
11/22/2019

High Quality ELMo Embeddings for Seven Less-Resourced Languages

Recent results show that deep neural networks using contextual embedding...
research
08/13/2020

MICE: Mining Idioms with Contextual Embeddings

Idiomatic expressions can be problematic for natural language processing...
research
09/19/2021

Conditional probing: measuring usable information beyond a baseline

Probing experiments investigate the extent to which neural representatio...
research
07/12/2022

Using Paraphrases to Study Properties of Contextual Embeddings

We use paraphrases as a unique source of data to analyze contextualized ...
research
09/05/2023

Substitution-based Semantic Change Detection using Contextual Embeddings

Measuring semantic change has thus far remained a task where methods usi...
research
01/31/2019

Decomposing Generalization: Models of Generic, Habitual, and Episodic Statements

We present a novel semantic framework for modeling linguistic expression...

Please sign up or login with your details

Forgot password? Click here to reset