Generalizing Word Embeddings using Bag of Subwords

09/12/2018
by   Jinman Zhao, et al.
0

We approach the problem of generalizing pre-trained word embeddings beyond fixed-size vocabularies without using additional contextual information. We propose a subword-level word vector generation model that views words as bags of character n-grams. The model is simple, fast to train and provides good vectors for rare or unseen words. Experiments show that our model achieves state-of-the-art performances in English word similarity task and in joint prediction of part-of-speech tag and morphosyntactic attributes in 23 languages, suggesting our model's ability in capturing the relationship between words' textual representations and their embeddings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2020

PBoS: Probabilistic Bag-of-Subwords for Generalizing Word Embedding

We look into the task of generalizing word embeddings: given a set of pr...
research
08/20/2016

Learning Word Embeddings from Intrinsic and Extrinsic Views

While word embeddings are currently predominant for natural language pro...
research
05/15/2018

Unsupervised Learning of Style-sensitive Word Vectors

This paper presents the first study aimed at capturing stylistic similar...
research
06/07/2018

Probabilistic FastText for Multi-Sense Word Embeddings

We introduce Probabilistic FastText, a new model for word embeddings tha...
research
03/15/2022

Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

State-of-the-art NLP systems represent inputs with word embeddings, but ...
research
06/01/2017

Learning to Compute Word Embeddings On the Fly

Words in natural language follow a Zipfian distribution whereby some wor...
research
03/17/2021

UniParma @ SemEval 2021 Task 5: Toxic Spans Detection Using CharacterBERT and Bag-of-Words Model

With the ever-increasing availability of digital information, toxic cont...

Please sign up or login with your details

Forgot password? Click here to reset