DeepAI
Log In Sign Up

Learning a Word-Level Language Model with Sentence-Level Noise Contrastive Estimation for Contextual Sentence Probability Estimation

03/14/2021
by   Heewoong Park, et al.
6

Inferring the probability distribution of sentences or word sequences is a key process in natural language processing. While word-level language models (LMs) have been widely adopted for computing the joint probabilities of word sequences, they have difficulty in capturing a context long enough for sentence probability estimation (SPE). To overcome this, recent studies introduced training methods using sentence-level noise-contrastive estimation (NCE) with recurrent neural networks (RNNs). In this work, we attempt to extend it for contextual SPE, which aims to estimate a conditional sentence probability given a previous text. The proposed NCE samples negative sentences independently of a previous text so that the trained model gives higher probabilities to the sentences that are more consistent with bluethe context. We apply our method to a simple word-level RNN LM to focus on the effect of the sentence-level NCE training rather than on the network architecture. The quality of estimation was evaluated against multiple-choice cloze-style questions including both human and automatically generated questions. The experimental results show that the proposed method improved the SPE quality for the word-level RNN LM.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

01/03/2016

Contrastive Entropy: A new evaluation metric for unnormalized language models

Perplexity (per word) is the most widely used metric for evaluating lang...
06/04/2014

A Geometric Method to Obtain the Generation Probability of a Sentence

"How to generate a sentence" is the most critical and difficult problem ...
11/21/2014

A Joint Probabilistic Classification Model of Relevant and Irrelevant Sentences in Mathematical Word Problems

Estimating the difficulty level of math word problems is an important ta...
05/20/2020

Sentence level estimation of psycholinguistic norms using joint multidimensional annotations

Psycholinguistic normatives represent various affective and mental const...
03/07/2021

Local word statistics affect reading times independently of surprisal

Surprisal theory has provided a unifying framework for understanding man...
12/21/2015

Backward and Forward Language Modeling for Constrained Sentence Generation

Recent language models, especially those based on recurrent neural netwo...
06/06/2022

Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation

While large-scale neural language models, such as GPT2 and BART, have ac...