PBoS: Probabilistic Bag-of-Subwords for Generalizing Word Embedding

10/21/2020
by   Zhao Jinman, et al.
0

We look into the task of generalizing word embeddings: given a set of pre-trained word vectors over a finite vocabulary, the goal is to predict embedding vectors for out-of-vocabulary words, without extra contextual information. We rely solely on the spellings of words and propose a model, along with an efficient algorithm, that simultaneously models subword segmentation and computes subword-based compositional word embedding. We call the model probabilistic bag-of-subwords (PBoS), as it applies bag-of-subwords for all possible segmentations based on their likelihood. Inspections and affix prediction experiment show that PBoS is able to produce meaningful subword segmentations and subword rankings without any source of explicit morphological knowledge. Word similarity and POS tagging experiments show clear advantages of PBoS over previous subword-level models in the quality of generated word embeddings across languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2018

Generalizing Word Embeddings using Bag of Subwords

We approach the problem of generalizing pre-trained word embeddings beyo...
research
08/18/2019

Parsimonious Morpheme Segmentation with an Application to Enriching Word Embeddings

Traditionally, many text-mining tasks treat individual word-tokens as th...
research
08/03/2016

Morphological Priors for Probabilistic Neural Word Embeddings

Word embeddings allow natural language processing systems to share stati...
research
09/04/2018

Segmentation-free compositional n-gram embedding

Applying conventional word embedding models to unsegmented languages, wh...
research
09/29/2017

Synonym Discovery with Etymology-based Word Embeddings

We propose a novel approach to learn word embeddings based on an extende...
research
02/24/2023

NoPPA: Non-Parametric Pairwise Attention Random Walk Model for Sentence Representation

We propose a novel non-parametric/un-trainable language model, named Non...
research
08/21/2018

Downsampling Strategies are Crucial for Word Embedding Reliability

The reliability of word embeddings algorithms, i.e., their ability to pr...

Please sign up or login with your details

Forgot password? Click here to reset