Context encoders as a simple but powerful extension of word2vec

06/08/2017
by   Franziska Horn, et al.
0

With a simple architecture and the ability to learn meaningful word embeddings efficiently from texts containing billions of words, word2vec remains one of the most popular neural language models used today. However, as only a single embedding is learned for every word in the vocabulary, the model fails to optimally represent words with multiple meanings. Additionally, it is not possible to create embeddings for new (out-of-vocabulary) words on the spot. Based on an intuitive interpretation of the continuous bag-of-words (CBOW) word2vec model's negative sampling training objective in terms of predicting context based similarities, we motivate an extension of the model we call context encoders (ConEc). By multiplying the matrix of trained word2vec embeddings with a word's average context vector, out-of-vocabulary (OOV) embeddings and representations for a word with multiple meanings can be created based on the word's local contexts. The benefits of this approach are illustrated by using these word embeddings as features in the CoNLL 2003 named entity recognition (NER) task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2017

A Syllable-based Technique for Word Embeddings of Korean Words

Word embedding has become a fundamental component to many NLP tasks such...
research
12/19/2022

Multi hash embeddings in spaCy

The distributed representation of symbols is one of the key technologies...
research
07/01/2019

Few-Shot Representation Learning for Out-Of-Vocabulary Words

Existing approaches for learning word embeddings often assume there are ...
research
07/19/2019

An Unsupervised Character-Aware Neural Approach to Word and Context Representation Learning

In the last few years, neural networks have been intensively used to dev...
research
12/30/2020

Deriving Contextualised Semantic Features from BERT (and Other Transformer Model) Embeddings

Models based on the transformer architecture, such as BERT, have marked ...
research
11/17/2015

Learning to retrieve out-of-vocabulary words in speech recognition

Many Proper Names (PNs) are Out-Of-Vocabulary (OOV) words for speech rec...
research
11/25/2019

Towards robust word embeddings for noisy texts

Research on word embeddings has mainly focused on improving their perfor...

Please sign up or login with your details

Forgot password? Click here to reset