Exposing the Implicit Energy Networks behind Masked Language Models via Metropolis–Hastings

06/04/2021
by   Kartik Goyal, et al.
0

While recent work has shown that scores from models trained by the ubiquitous masked language modeling (MLM) objective effectively discriminate probable and improbable sequences, it is still an open question if these MLMs specify a principled probability distribution over the space of possible sequences. In this paper, we interpret MLMs as energy-based sequence models and propose two energy parametrizations derivable from the trained MLMs. In order to draw samples correctly from these models, we develop a tractable sampling scheme based on the Metropolis–Hastings Monte Carlo algorithm. In our approach, samples are proposed from the same masked conditionals used for training the masked language models, and they are accepted or rejected based on their energy values according to the target distribution. We validate the effectiveness of the proposed parametrizations by exploring the quality of samples drawn from these energy-based models on the conditional generation task of machine translation. We theoretically and empirically justify our sampling algorithm by showing that the masked conditionals on their own do not yield a Markov chain whose stationary distribution is that of our target distribution, and our approach generates higher quality samples than other recently proposed undirected generation approaches (Wang et al., 2019, Ghazvininejad et al., 2019).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2023

Resampling Gradients Vanish in Differentiable Sequential Monte Carlo Samplers

Annealed Importance Sampling (AIS) moves particles along a Markov chain ...
research
06/05/2023

Structured Voronoi Sampling

Recently, there has been a growing interest in the development of gradie...
research
09/20/2020

Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models

The discrepancy between maximum likelihood estimation (MLE) and task mea...
research
12/10/2021

Sampling from Discrete Energy-Based Models with Quality/Efficiency Trade-offs

Energy-Based Models (EBMs) allow for extremely flexible specifications o...
research
06/09/2021

Energy-Based Models for Code Generation under Compilability Constraints

Neural language models can be successfully trained on source code, leadi...
research
09/13/2023

Harvesting Brownian Motion: Zero Energy Computational Sampling

The key factor currently limiting the advancement of computational power...
research
04/22/2020

Residual Energy-Based Models for Text Generation

Text generation is ubiquitous in many NLP tasks, from summarization, to ...

Please sign up or login with your details

Forgot password? Click here to reset