Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models

05/28/2020
by   Zhijian Ou, et al.
0

Although with progress in introducing auxiliary amortized inference models, learning discrete latent variable models is still challenging. In this paper, we show that the annoying difficulty of obtaining reliable stochastic gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully addressed in a new method based on stochastic approximation (SA) theory of the Robbins-Monro type. Specifically, we propose to directly maximize the target log-likelihood and simultaneously minimize the inclusive divergence between the posterior and the inference model. The resulting learning algorithm is called joint SA (JSA). To the best of our knowledge, JSA represents the first method that couples an SA version of the EM (expectation-maximization) algorithm (SAEM) with an adaptive MCMC procedure. Experiments on several benchmark generative modeling and structured prediction tasks show that JSA consistently outperforms recent competitive algorithms, with faster convergence, better final likelihoods, and lower variance of gradient estimates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/20/2016

Joint Stochastic Approximation learning of Helmholtz Machines

Though with progress, model learning and performing posterior inference ...
research
06/27/2020

Thermodynamic Machine Learning through Maximum Work Production

Adaptive thermodynamic systems – such as a biological organism attemptin...
research
05/26/2016

Adiabatic Persistent Contrastive Divergence Learning

This paper studies the problem of parameter learning in probabilistic gr...
research
02/14/2017

Practical Learning of Predictive State Representations

Over the past decade there has been considerable interest in spectral al...
research
02/13/2023

GFlowNet-EM for learning compositional latent variable models

Latent variable models (LVMs) with discrete compositional latents are an...
research
02/11/2019

Divergence-Based Motivation for Online EM and Combining Hidden Variable Models

Expectation-Maximization (EM) is the fallback method for parameter estim...
research
06/07/2016

Optimizing Spectral Learning for Parsing

We describe a search algorithm for optimizing the number of latent state...

Please sign up or login with your details

Forgot password? Click here to reset