Self-Normalization Properties of Language Modeling

06/04/2018
by   Jacob Goldberger, et al.
0

Self-normalizing discriminative models approximate the normalized probability of a class without having to compute the partition function. In the context of language modeling, this property is particularly appealing as it may significantly reduce run-times due to large word vocabularies. In this study, we provide a comprehensive investigation of language modeling self-normalization. First, we theoretically analyze the inherent self-normalization properties of Noise Contrastive Estimation (NCE) language models. Then, we compare them empirically to softmax-based approaches, which are self-normalized using explicit regularization, and suggest a hybrid model with compelling properties. Finally, we uncover a surprising negative correlation between self-normalization and perplexity across the board, as well as some regularity in the observed errors, which may potentially be used for improving self-normalization algorithms in the future.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2021

Self-Normalized Importance Sampling for Neural Language Modeling

To mitigate the problem of having to traverse over the full vocabulary i...
research
10/22/2020

Autoregressive Modeling is Misspecified for Some Sequence Distributions

Should sequences be modeled autoregressively—one symbol at a time? How m...
research
10/27/2022

Vanishing Component Analysis with Contrastive Normalization

Vanishing component analysis (VCA) computes approximate generators of va...
research
06/12/2015

On the accuracy of self-normalized log-linear models

Calculation of the log-normalizer is a major computational obstacle in a...
research
06/22/2022

Modeling Emergent Lexicon Formation with a Self-Reinforcing Stochastic Process

We introduce FiLex, a self-reinforcing stochastic process which models f...
research
08/05/2023

On problematic practice of using normalization in Self-modeling/Multivariate Curve Resolution (S/MCR)

The paper is briefly dealing with greater or lesser misused normalizatio...
research
12/27/2021

Self-normalized Classification of Parkinson's Disease DaTscan Images

Classifying SPECT images requires a preprocessing step which normalizes ...

Please sign up or login with your details

Forgot password? Click here to reset