Alternative Weighting Schemes for ELMo Embeddings

04/05/2019
by   Nils Reimers, et al.
0

ELMo embeddings (Peters et. al, 2018) had a huge impact on the NLP community and may recent publications use these embeddings to boost the performance for downstream NLP tasks. However, integration of ELMo embeddings in existent NLP architectures is not straightforward. In contrast to traditional word embeddings, like GloVe or word2vec embeddings, the bi-directional language model of ELMo produces three 1024 dimensional vectors per token in a sentence. Peters et al. proposed to learn a task-specific weighting of these three vectors for downstream tasks. However, this proposed weighting scheme is not feasible for certain tasks, and, as we will show, it does not necessarily yield optimal performance. We evaluate different methods that combine the three vectors from the language model in order to achieve the best possible performance in downstream NLP tasks. We notice that the third layer of the published language model often decreases the performance. By learning a weighted average of only the first two layers, we are able to improve the performance for many datasets. Due to the reduced complexity of the language model, we have a training speed-up of 19-44

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2019

FlauBERT: Unsupervised Language Model Pre-training for French

Language models have become a key step to achieve state-of-the-art resul...
research
10/15/2021

SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer

As pre-trained language models have gotten larger, there has been growin...
research
01/01/2021

WARP: Word-level Adversarial ReProgramming

Transfer learning from pretrained language models recently became the do...
research
04/19/2021

When FastText Pays Attention: Efficient Estimation of Word Representations using Constrained Positional Weighting

Since the seminal work of Mikolov et al. (2013a) and Bojanowski et al. (...
research
03/26/2019

SciBERT: Pretrained Contextualized Embeddings for Scientific Text

Obtaining large-scale annotated data for NLP tasks in the scientific dom...
research
03/23/2020

Word2Vec: Optimal Hyper-Parameters and Their Impact on NLP Downstream Tasks

Word2Vec is a prominent tool for Natural Language Processing (NLP) tasks...
research
05/29/2023

Vec2Gloss: definition modeling leveraging contextualized vectors with Wordnet gloss

Contextualized embeddings are proven to be powerful tools in multiple NL...

Please sign up or login with your details

Forgot password? Click here to reset