Learning Numeral Embeddings

12/28/2019
by   Chengyue Jiang, et al.
0

Word embedding is an essential building block for deep learning methods for natural language processing. Although word embedding has been extensively studied over the years, the problem of how to effectively embed numerals, a special subset of words, is still underexplored. Existing word embedding methods do not learn numeral embeddings well because there are an infinite number of numerals and their individual appearances in training corpora are highly scarce. In this paper, we propose two novel numeral embedding methods that can handle the out-of-vocabulary (OOV) problem for numerals. We first induce a finite set of prototype numerals using either a self-organizing map or a Gaussian mixture model. We then represent the embedding of a numeral as a weighted average of the prototype number embeddings. Numeral embeddings represented in this manner can be plugged into existing word embedding learning approaches such as skip-gram for training. We evaluated our methods and showed its effectiveness on four intrinsic and extrinsic tasks: word similarity, embedding numeracy, numeral prediction, and sequence label

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2016

PSDVec: a Toolbox for Incremental and Scalable Word Embedding

PSDVec is a Python/Perl toolbox that learns word embeddings, i.e. the ma...
research
07/25/2023

Word Sense Disambiguation as a Game of Neurosymbolic Darts

Word Sense Disambiguation (WSD) is one of the hardest tasks in natural l...
research
06/09/2015

WordRank: Learning Word Embeddings via Robust Ranking

Embedding words in a vector space has gained a lot of attention in recen...
research
09/21/2023

An Efficient Consolidation of Word Embedding and Deep Learning Techniques for Classifying Anticancer Peptides: FastText+BiLSTM

Anticancer peptides (ACPs) are a group of peptides that exhibite antineo...
research
07/22/2016

Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization

Word embedding methods revolve around learning continuous distributed ve...
research
06/23/2019

Smaller Text Classifiers with Discriminative Cluster Embeddings

Word embedding parameters often dominate overall model sizes in neural m...
research
09/12/2017

Hash Embeddings for Efficient Word Representations

We present hash embeddings, an efficient method for representing words i...

Please sign up or login with your details

Forgot password? Click here to reset