Rare Words Degenerate All Words

09/07/2021
by   Sangwon Yu, et al.
0

Despite advances in neural network language model, the representation degeneration problem of embeddings is still challenging. Recent studies have found that the learned output embeddings are degenerated into a narrow-cone distribution which makes the similarity between each embeddings positive. They analyzed the cause of the degeneration problem has been demonstrated as common to most embeddings. However, we found that the degeneration problem is especially originated from the training of embeddings of rare words. In this study, we analyze the intrinsic mechanism of the degeneration of rare word embeddings with respect of their gradient about the negative log-likelihood loss function. Furthermore, we theoretically and empirically demonstrate that the degeneration of rare word embeddings causes the degeneration of non-rare word embeddings, and that the overall degeneration problem can be alleviated by preventing the degeneration of rare word embeddings. Based on our analyses, we propose a novel method, Adaptive Gradient Partial Scaling(AGPS), to address the degeneration problem. Experimental results demonstrate the effectiveness of the proposed method qualitatively and quantitatively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2016

Learning Word Embeddings from Intrinsic and Extrinsic Views

While word embeddings are currently predominant for natural language pro...
research
06/01/2017

Learning to Compute Word Embeddings On the Fly

Words in natural language follow a Zipfian distribution whereby some wor...
research
03/18/2018

Rare Feature Selection in High Dimensions

It is common in modern prediction problems for many predictor variables ...
research
08/28/2018

Card-660: Cambridge Rare Word Dataset - a Reliable Benchmark for Infrequent Word Representation Models

Rare word representation has recently enjoyed a surge of interest, owing...
research
07/28/2019

Representation Degeneration Problem in Training Natural Language Generation Models

We study an interesting problem in training neural network-based models ...
research
04/02/2019

Attentive Mimicking: Better Word Embeddings by Attending to Informative Contexts

Learning high-quality embeddings for rare words is a hard problem becaus...
research
07/06/2020

Reflection-based Word Attribute Transfer

Word embeddings, which often represent such analogic relations as king -...

Please sign up or login with your details

Forgot password? Click here to reset