Relaxations for inference in restricted Boltzmann machines

12/21/2013 ∙ by Sida I. Wang, et al. ∙ 0

We propose a relaxation-based approximate inference algorithm that samples near-MAP configurations of a binary pairwise Markov random field. We experiment on MAP inference tasks in several restricted Boltzmann machines. We also use our underlying sampler to estimate the log-partition function of restricted Boltzmann machines and compare against other sampling-based methods.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Background and setup

A binary pairwise Markov random field (MRF) over variables

models a probability distribution

. The non-diagonal entries of the matrix encode pairwise potentials between variables while its diagonal entries encode unary potentials. The exponentiated linear term is the negative energy or simply the score of the MRF. A restricted Boltzmann machine (RBM) is a particular MRF whose variables are split into two classes, visible and hidden, and in which intra-class pairwise potentials are disallowed.


We write for the set of symmetric real matrices, and to denote the unit sphere

. All vectors are columns unless stated otherwise.

1.1 Integer quadratic programming

Finding the maximum a posteriori (MAP) value of a discrete pairwise MRF can be cast as an integer quadratic program (IQP) given by


Note that we have the domain constraint rather than . We relate the two in Section LABEL:sec:hypercubes.

2 Relaxations

Solving eqn:iqp is NP-hard in general. In fact, the MAX-CUT problem is a special case. Even the cases where encodes an RBM are NP-hard in general (alon2006approximating). We can trade off exactness for efficiency and instead optimize a relaxed (indefinite) quadratic program:


Such a relaxation is tight for positive semidefinite : global optima of the QP and the IQP have equal objective values.111We can always ensure tightness when is not PSD, as in ravikumar2006quadratic. Therefore eqn:qp is just hard in general as eqn:iqp, even though it affords optimization by gradient-based methods in place of combinatorial search.

The following semidefinite program (SDP) is a looser relaxation of eqn:iqp obtained by extending to higher ambient dimension: