Mimicking Word Embeddings using Subword RNNs

07/21/2017
by   Yuval Pinter, et al.
0

Word embeddings improve generalization over lexical features by placing each word in a lower-dimensional space, using distributional information obtained from unlabeled data. However, the effectiveness of word embeddings for downstream NLP tasks is limited by out-of-vocabulary (OOV) words, for which embeddings do not exist. In this paper, we present MIMICK, an approach to generating OOV word embeddings compositionally, by learning a function from spellings to distributional embeddings. Unlike prior work, MIMICK does not require re-training on the original word embedding corpus; instead, learning is performed at the type level. Intrinsic and extrinsic evaluations demonstrate the power of this simple approach. On 23 languages, MIMICK improves performance over a word-based baseline for tagging part-of-speech and morphosyntactic attributes. It is competitive with (and complementary to) a supervised character-based model in low-resource settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2016

Morphological Priors for Probabilistic Neural Word Embeddings

Word embeddings allow natural language processing systems to share stati...
research
08/12/2016

Redefining part-of-speech classes with distributional semantic models

This paper studies how word embeddings trained on the British National C...
research
04/27/2020

Synonyms and Antonyms: Embedded Conflict

Since modern word embeddings are motivated by a distributional hypothesi...
research
09/22/2022

Homophone Reveals the Truth: A Reality Check for Speech2Vec

Generating spoken word embeddings that possess semantic information is a...
research
06/14/2023

Contrastive Loss is All You Need to Recover Analogies as Parallel Lines

While static word embedding models are known to represent linguistic ana...
research
09/03/2019

On the Downstream Performance of Compressed Word Embeddings

Compressing word embeddings is important for deploying NLP models in mem...
research
12/28/2021

Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora

The problem of comparing two bodies of text and searching for words that...

Please sign up or login with your details

Forgot password? Click here to reset