Log In Sign Up

Learning a Word-Level Language Model with Sentence-Level Noise Contrastive Estimation for Contextual Sentence Probability Estimation

by   Heewoong Park, et al.

Inferring the probability distribution of sentences or word sequences is a key process in natural language processing. While word-level language models (LMs) have been widely adopted for computing the joint probabilities of word sequences, they have difficulty in capturing a context long enough for sentence probability estimation (SPE). To overcome this, recent studies introduced training methods using sentence-level noise-contrastive estimation (NCE) with recurrent neural networks (RNNs). In this work, we attempt to extend it for contextual SPE, which aims to estimate a conditional sentence probability given a previous text. The proposed NCE samples negative sentences independently of a previous text so that the trained model gives higher probabilities to the sentences that are more consistent with bluethe context. We apply our method to a simple word-level RNN LM to focus on the effect of the sentence-level NCE training rather than on the network architecture. The quality of estimation was evaluated against multiple-choice cloze-style questions including both human and automatically generated questions. The experimental results show that the proposed method improved the SPE quality for the word-level RNN LM.


page 1

page 2

page 3

page 4

page 5

page 6

page 7


Contrastive Entropy: A new evaluation metric for unnormalized language models

Perplexity (per word) is the most widely used metric for evaluating lang...

A Geometric Method to Obtain the Generation Probability of a Sentence

"How to generate a sentence" is the most critical and difficult problem ...

A Joint Probabilistic Classification Model of Relevant and Irrelevant Sentences in Mathematical Word Problems

Estimating the difficulty level of math word problems is an important ta...

Sentence level estimation of psycholinguistic norms using joint multidimensional annotations

Psycholinguistic normatives represent various affective and mental const...

Local word statistics affect reading times independently of surprisal

Surprisal theory has provided a unifying framework for understanding man...

Backward and Forward Language Modeling for Constrained Sentence Generation

Recent language models, especially those based on recurrent neural netwo...

Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation

While large-scale neural language models, such as GPT2 and BART, have ac...