Lifted Hybrid Variational Inference

01/08/2020 ∙ by Yuqiao Chen, et al. ∙ The University of Texas at Dallas 0

A variety of lifted inference algorithms, which exploit model symmetry to reduce computational cost, have been proposed to render inference tractable in probabilistic relational models. Most existing lifted inference algorithms operate only over discrete domains or continuous domains with restricted potential functions, e.g., Gaussian. We investigate two approximate lifted variational approaches that are applicable to hybrid domains and expressive enough to capture multi-modality. We demonstrate that the proposed variational methods are both scalable and can take advantage of approximate model symmetries, even in the presence of a large amount of continuous evidence. We demonstrate that our approach compares favorably against existing message-passing based approaches in a variety of settings. Finally, we present a sufficient condition for the Bethe approximation to yield a non-trivial estimate over the marginal polytope.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Lifted methods have recently gained popularity due to their ability to handle previously intractable probabilistic inference queries over Markov random fields (MRFs) and their generalizations. These approaches work by exploiting symmetries in the given model construct groups of indistinguishable random variables that can be used to collapse the model into a simpler one on which inference is more tractable.

High-level approaches to lifted inference include message-passing algorithms such as lifted belief propagation [21, 13] and lifted variational methods [2, 8]. The common theme across these methods is the construction of a lifted graph on which the corresponding inference algorithms are run. The message-passing algorithms are applied directly on the lifted graph, while lifted variational methods encode symmetries in the model as equality constraints in the variational objective. These two approaches are directly related via the same variational objective, known as the Bethe free energy [24, 25]. While successful, these methods were designed for discrete relational MRFs.

Existing work on lifting with continuous domains has focused primarily on Gaussian graphical models [3, 6, 1]. Other lifted inference methods for generic hybrid domains [5]

use expectation maximization to learn variational approximations of the MRF potentials (which requires sampling from the potentials, hence implicitly assumes the potentials are normalizable) and then perform lifted variable elimination or MCMC on the resulting variational model. While applicable to generic models, this approach is somewhat complex and can be expensive on lifted graphs with large treewidth.

Our aim is to provide a general framework for lifted variational inference that can be applied to both continuous and discrete domains with generic potential functions. Our approach is based on mixtures of mean-field models and a choice of entropy approximation. We consider two entropy approximations, one based on the Bethe free energy, whose local optima are closely related to fixed points of BP [24], and a lower bound on the differential entropy based on Jensen’s inequality [9].

We make the following key contributions in this work: (1) We develop a general lifted hybrid variational approach for probabilistic inference. (2) We consider two different types of approximations based on mixtures of mean-field models. To our knowledge a systematic comparison of these two different approximations for continuous models does not exist in the literature. (3) We provide theoretical justification for the Bethe free energy in the continuous case by providing a sufficient condition for it to be bounded from below over the marginal polytope. (4) We demonstrate the superiority of our approach empirically against particle-based message-passing algorithms and variational mean field. A key attribute of our work is that it does not make any distributional or model assumptions and can be applied to arbitrary factor graphs.

Preliminaries

Given a hypergraph with node set and hyper-edge/clique set , such that each node corresponds to a random variable

, a Markov random field (MRF) defines a joint probability distribution

(1)

where each is a non-negative potential function associated with a clique in , and is a normalizing constant that ensures

is a probability density. In this work, we consider hybrid MRFs that may contain both discrete and continuous random variables, so that

may either be finite or uncountable. For example, if all of the variables have continuous domains and the product of potential functions is integrable, then the normalization constant exists, e.g., . The hypergraph is often visualized as a factor graph that has vertices for both the cliques/factors and the variables, with an edge joining the factor node for clique to the variable node if .

We consider two probabilistic inference tasks for a given MRF: (1) marginal inference, i.e., computing the marginal probability distribution of a set of variables , a special case of which is computing the partition function when and (2) maximum a posteriori (MAP) inference, i.e., computing a mode of the distribution . In many applications, we will be given observed values for a set of variables , and the corresponding conditional marginal / MAP inference tasks involve computing instead of .

Variational Inference

Variational inference (VI) solves the inference problem approximately by minimizing some divergence measure

, often chosen to be the Kullback-Leibler divergence, between distributions of the form (

1) and a family of more tractable approximating distributions , to obtain a surrogate distribution .

The set is typically chosen to trade-off between the computational ease of inference in a surrogate model and its ability to model complex distributions. When is the KL divergence, the optimization problem is equivalent to minimizing the variatinoal free energy.

where denotes the entropy of the distribution . Assuming one can find a , the simpler model can be used as a surrogate for inference. One of the most common choices for is the mean-field approximation in which is selected to be the set of completely factorized distributions. This choice is popular as the optimization problem is relatively easy to solve for distributions of this form.

Lifted Probabilistic Inference via Color Passing

Lifted inference exploits symmetries that exist in the MRF in order to reduce the complexity of inference. This is typically done by grouping symmetric variables or cliques together into a single super variable/clique and then tying together the corresponding marginals of all variables in the same super variable/clique [2]. Detecting symmetries can be done by using either a top-down [21] or bottom-up [13] approach. We use the color passing (CP) algorithm [13], a bottom-up approach that can be applied to arbitrary MRFs. In CP, all variable and factor nodes are initially clustered based on domain/evidence and the potential functions. Variables with the same domain or with the same evidence value will be assigned a same color. Each clique node stacks the color of its neighboring nodes in order, appends its own color, and forms a new color. Each variable node collects its neighboring cliques’ colors and is assigned a new color. The process is repeated until the colors converge. The color information can be considered as neighborhood structure information and grouping nodes with the same color can be used to compress the graph. In this work, we use the notation and to denote the number of factor nodes in a super factor and the number of variable nodes in super variable , respectively.

Proposed Approaches

Our goal is to develop a distribution-independent, model-agnostic hybrid lifted inference algorithm that can operate on an arbitrary factor graph. To overcome the severe limitations of unimodal variational distributions, e.g., mean-field models, in the hybrid setting, we choose our approximate family to be a family of mixture distributions, and following [12] and [9], we require each mixture component to fully factorize111Note that this assumption is mainly for efficiency. Our approach does not require this assumption for effectiveness.. Specifically,

(2)

where is the number of mixture components, is the weight of the mixture (a shared parameter across all marginal distributions), and . Each is some valid distribution with parameters

, e.g., a Gaussian or Beta distribution in the continuous case and a Categorical distribution in the discrete case.

Entropy Approximations

Ideally, we would find the appropriate model parameters and by directly minimizing the KL divergence. Unfortunately, the computation of the entropy is intractable for arbitrary variational distributions of the form (2). A notable exception is the case with , which is equivalent to the naïve mean-field approximation. In the general case, we consider two tractable entropy approximations: one based on the Bethe free energy approximation from statistical physics and one based on Jensen’s inequality.

The Bethe Entropy: approximates as if the graph associated with were tree-structured:

where is the set of cliques that contain node in their scope. The Bethe free energy (BFE) is then defined as

The BFE approximation is exact whenever the hypergraph is acyclic, i.e., tree-structured. While variational methods seek to optimize the variational objective directly, message-passing algorithms such as belief propagation (BP) can also be used to find local optima of the BFE [24]. As these types of message-passing algorithms can suffer from convergence issues, gradient based methods that optimize the variational objective directly are sometimes preferred in practice [23].

Jensen’s Inequality: The non-parametric variational inference (NPVI) approximates the entropy using Jensen’s inequality [9].

(3)

There are two reasons to prefer the Bethe entropy approximation over (3). First, the BFE is exact on trees; for tree-structured models, it’s likely to outperform NPVI. Second, (3) doesn’t factorize over the graph, potentially making it less useful in distributed settings.

Conversely, one advantage of (3) over the Bethe entropy is that it yields a provable lower bound on the partition function assuming exact computation. The Bethe entropy only provably translates into a lower bound on tree structured models or for special classes of potential functions [18, 19, 20]. Another known drawback of the BFE approximation is that, in the case of continuous random variables, it need not be bounded from below over the so-called local marginal polytope, a further relaxation of the varitional problem in which the optimization over distributions is replaced by a simpler optimization problem over only marginal distributions that agree on their univariate marginals. This unboundedness can occur even if

corresponds to a multivariate Gaussian distribution

[7]. This potentially makes the BFE highly undesirable for continuous MRFs in practice.

However, for the optimization problem considered here (over a subset of what is referred to as the marginal polytope), it is known that the BFE is bounded from below for Gaussian . Here, we prove that the BFE will be bounded from below over the marginal polytope for a larger class of probability distributions.

Theorem 1.

If there exists a collection of densities for each such that , then , where is the set of all probability densities over , i.e., the BFE is bounded from below.

A number of natural distributions satisfy the condition of the theorem: mean-field distributions, multivariate Gaussian distributions and their mixtures, bounded densities with compact support, etc. The proof of this sufficient condition constructs a lower bound on the BFE that is a convex function of . Lagrange duality is then used to demonstrate that the lower bound is finite under the condition of the theorem.

Lifting

Once model symmetries are detected using coloring passing or an alternative method, they can be encoded into the variational objective by introducing constraints on the marginal distributions, e.g., adding a constraint that all variables in the same super node have equivalent marginals. This is the approach taken by [2] for lifted variational inference in discrete MRFs. For us, this leads to the following set of constraints.

(4)

If preferred, these constraints could be incorporated into the objective as soft penalty terms to encourage the solution to contain the appropriate symmetries as discovered by color-passing. However, adding constraints of this form to the objective does not reduce the cost of performing inference in the lifted model. In order to make lifting tractable, we observe that the following constraints are sufficient for (4) to hold.

(5)

Under the constraint (5), we can simplify the variational objective by accounting for the shared parameters. As an example of how this works for the Bethe free energy approximation, consider a compressed graph with a set of super variables and a set of super factors , each and each corresponds to variables and factors in the original graph. Variables in the super variable share the same parameterized marginals. Based on this observation, we can simplify the computation of the unlifted BFE by exploiting these symmetries:

Although the optimal value of the variational objective under the constraints (4) or (5) is always greater than or equal to that of the unconstrained problem, we expect gradient descent on the constrained optimization problem to converge faster and to a better solution as the optimal solution should contain these symmetries. The intuition for this is that that the solutions to the unconstrained optimization problem, i.e., approximate inference in the unlifted model, can include both solutions that do and do not respect the model symmetries.

Algorithm

Given a ground MRF (possibly conditioned on evidence), we first obtain the variational distrubtion by gradient descent on the variational objective w.r.t. the parameters of mean-field variational mixture , where the th marginal is taken to be a Gaussian distribution for all continuous and a categorical distribution otherwise. The lifted variational inference algorithms additionally exploit symmetries by using (5) to simplify the objective and only optimize over the variational parameters associated with the super variables ; after the optimization procedure, all the original variables contained in each super variable are assigned the same variational marginal parameters/distributions as in (5). The expectations in the variational objectives can be approximated in different ways, e.g., sampling, Stein variational methods [15], etc. We approximate the expectations using Gaussian quadrature with a fixed number of quadrature points [10]. Once is obtained, given a set of query variables , marginal inference is approximated by , and (marginal) MAP inference is approximated by via coordinate/gradient ascent.

Coarse-to-Fine Lifting

A common issue in lifting methods is that introducing evidence breaks model symmetries as variables with different evidence values should be considered as different even if they have similar neighborhood structure in the graphical model. This issue is worse when variables can have continuous values: it is unlikely that two otherwise symmetric variables will receive the same exact evidence values. As a result, even with a small amount of evidence, many of the model symmetries may be destroyed, making lifting less useful. To counteract this effect, we propose a coarse-to-fine (C2F) approximate lifting method in the variational setting which is based on the assumption that the stationary points of a coarsely compressed graph and a finely compressed graph should be somewhat similar. A number of coarse-to-fine lifting schemes, which start with coarse approximate symmetries and gradually refine them, have been proposed for discrete MRFs [11, 8]. Our approach is aimed specifically at introducing approximate symmetries to handle the above issue with continuous evidence.

Our C2F approximate lifting uses -means clustering to group the continuous evidence values into clusters, . For each cluster , we denote the corresponding set of observation nodes as . Each observed variable is treated as having the same evidence distribution , where and

are the mean and variance of cluster

. With this formalism, the evidence clustering is coarse when is small, but we can exploit more approximate symmetries, resulting in a more compressed lifted graph. As increases, the evidence variables are more finely divided.

To apply this lifting process in variational inference, we interleave the operation of refining the compressed graph and gradient descent. The clustering is initialized with and CP is run until convergence to obtain a coarse compressed graph. Then, we perform gradient descent on the coarse compressed graph with the modified variational method. After a specified number of iterations, we refine the coarse compressed graph by splitting evidence clusters. We employ the -means algorithm to determine the new evidence clusters, after which a refined compressed graph can be obtained through CP. We keep iterating this process until no evidence group can be further split, e.g., when only one value remains or the variance of each cluster is below a specified threshold, and the optimization converges to a stationary point. A precise description of this process can be found in Algorithm 1. It is not necessary to rerun CP from the start after each split. We simply assign a new evidence group and a new color and resume CP from its previous stopping point.

1:Input: A factor graph with variables , factors , evidence and splitting threshold
2:Return: The model parameters and
3: initial clustering of continuous evidence and model parameters respectively
4: run CP starting from initial colors based on domain/evidence and potential function
5:repeat
6:      run grad. descent on variational obj.
7:     for each  do
8:         if  then
9:               Divide in two using -means
10:              
11:         end if
12:     end for
13:     Assign new colors to evidence according to
14:     
15:until Converge
Algorithm 1 Coarse-to-Fine Lifted VI

Experiments

We investigate the performance of the above lifted variational approach on a variety of both real and synthetic models. We aim to answer the following questions explicitly:

  • Do the proposed variational approaches yield accurate MAP and marginal inference results?

  • Does lifting result in significant speed-ups versus an unlifted variational method?

  • Does C2F lifting yield accurate results more quickly for queries with continuous evidence?

To answer these questions, we compare the performance of our variational approach with different entropy approximations (BVI for Bethe approximation and NPVI for the Jensen lower bound) with message-passing algorithms including our own lifted version of Expectation Particle BP (EPBP) [14] and Gaussian BP (GaBP). To illustrate the generality of the proposed approach, we consider different settings – Relational Gaussian Models (RGMs) [3]

, Relational Kalman Filters (RKFs)

[6], and (Hybrid) Markov Logic Networks (HMLNs) [17, 22].

For evaluation, we generally report the error of MAP predictions and KL divergence averaged across all univariate marginals in the (ground) conditional MRF. As the models in the RGM and RKF experiments are Gaussian MRFs, their marginal means and variances can be computed exactly by matrix operations. For the HMLN experiments, the exact ground truth can be obtained by direct methods when the number of random variables in the conditional MRF is small. For timing experiments, unless otherwise noted, were performed on a single core of a machine with a 2.2 GHz Intel Core i7-8750H CPU and 16 GB of memory. Source code is implemented with Python3.6 and is available online github.com/leodd/Lifted-Hybrid-Variational-Inference.

Hybrid MLNs

We first consider a toy HMLN with known ground truth marginals in order to asses the accuracy of the different variational approaches in the hybrid setting. Then, we showcase the efficiency of our methods, particularly via lifting, on larger-scale HMLNs of practical interest, comparing against a state-of-the-art sampling baseline.

Toy Problem

We construct a hybrid MLN for a position domain:

where and are different classes of objects in some physics simulation, is the class of box instances, and are real values corresponding to object positions. The first formula states that object is attracted to when they are in the same box; the second states that if is attracted to , then is likely at position , otherwise is likely at position . Predicates and have discrete domain , while is real-valued.

Unlike standard MLNs, where the value of a formula can be computed with logical operations, an HMLN defines continuous operations for hybrid formulas. In the second formula, is shorthand for the feature function ; note that the corresponding linear Gaussian potential is not normalizable. The marginals of and will generally be multimodal, specifically mixtures of Gaussians; with multiple object instances, unimodal variational approximations like mean-field will likely be inaccurate.

Figure 1: Visualization of typical marginal distribution estimates produced by BVI and NPVI on the toy HMLN experiment, using . The lifted solutions were similar.

Note that the number of mixture components in the marginal distributions is exponential in the number of joint discrete configurations in the ground MRF, so we consider a small model in which exact inference is still tractable, generating 2, 3, and 2 instances of the , , and classes respectively. The resulting ground MRF contains discrete random variables yielding marginals with mixture components. We emphasize that the small model size is only for the purpose of evaluation against brute-force exact inference; our methods can scale to much larger models.

We performed marginal inference on the continuous nodes and report results against ground truth in Table 1, using BVI, NPVI, and their lifted versions (dubbed L-BVI and L-NPVI). All methods tend to give improved performance as the number of mixture components () increases indicating that the number of mixture components is indeed important for accuracy in multimodal settings. However, increasing generally makes the optimization problem more difficult, requiring more iterations for convergence. We note that even though L-BVI reported a lower error with , the KL-divergence of this was larger than at or , indicating that it converged to a good local optimum for the MAP task but not as good for the marginal inference task. This distinction can be seen more broadly across the two entropy approximations for this problem, as BVI/L-BVI generally gave better fits to the marginals than NPVI/L-NPVI, whereas NPVI/L-NPVI performed better at estimating the marginal modes. See Figure 1 for an example illustration of marginals produced by the different entropy approximations.

It is also worth noting that lifting seems to act as a regularizer here: when the number of mixture components is small, both the lifted versions outperformed their unlifted counterparts, e.g., at =1. This suggests that lifting may both reduce computational cost ( to speedup on this model) and encourage the optimization procedure to end up in better local optima, which positively answers Q1 and Q2.

Algorithm Average KL-Divergence
BVI
L-BVI
NPVI
L-NPVI
Algorithm Average -error
BVI
L-BVI
NPVI
L-NPVI
Table 1: Results of variational methods on toy HMLN.

Larger-scale Problems

Next, we consider two larger scale HMLNs of practical interest. The Paper Popularity HMLN domain is determined by the following formulas.

where and are continuous variables in , indicating the popularity of paper and topic. and are Boolean variables, indicating if two topics are in the same session, and if a paper belongs to a topic , respectively. The first formula specifies a prior on paper popularity. The second clause states that two topics tend to have the same popularity if they are in the same session, and the third clause states that if a paper is in a topic, then the popularity of the topic and the paper are likely to be the same. We instantiate 300 paper instances and 10 topic instances, which results in variable nodes and factor nodes in the grounded MRF. We generated random evidence for the model, where of the papers and

of the topics were assigned a popularity from a uniform distribution

. For of the papers and all topics , we assign and for all possible

using a Bernoulli distribution (

).

(a) Paper Popularity
(b) Robot Mapping
Figure 2: Comparison of the negative log probability of the approximate MAP assignment versus running time.

The Robot Mapping HMLN domain contains discrete relational variables, continuous relational variables, and formulas, as described in the Alchemy tutorial [22]. The instances and evidences are from real world robot scanning data, which result in a grounded MRF with random variables and factors.

Results: We performed MAP inference with the variational methods and evaluated them against Hybrid MaxWalkSAT (HMWS) [22]. Each method is evaluated by computing the energy , essentially the negative log probability, of the approximate MAP configuration that it produces. For HMWS, we set the greedy probability to

, the standard deviation of the Gaussian noise to

, and disabled re-running for fair comparison. For the variational methods, we used ADAM with learning rate for optimization.

As can be seen in Figure 2, in both domains, the MAP assignment produced by the variational methods is significantly better than the one by HMWS, providing evidence for Q1. In addition, given the amount of continuous evidence, there is not a significant performance difference between BVI and Lifted BVI. However, C2F BVI takes less significantly less time to converge to a good solution than both BVI and Lifted BVI, providing strong evidence for Q3, namely that C2F results in better accuracy more quickly.

Relational Gaussian Model

In this experiment, we performed approximate inference on a Relational Gaussian Model (RGM) with the recession domain from [7]. The RGM has three relational atoms , , and one random variable , where and denote two sets of instances, the categories of market and banks respectively. For testing, we generated Market and Bank instances, and used the ground graph as input. To assess the impact of lifting and C2F, we randomly chose of the variables in the model, assigned them a value uniformly randomly from , and then performed conditional MAP and marginal inference.

Algorithm Avg. Error Avg. KL-Divergence
EPBP
GaBP
BVI
L-BVI
C2F-BVI
NPVI
L-NPVI
Table 2: Evaluation of various methods on RGM.
Figure 3: Comparison of rate of convergence between BVI, L-BVI, and C2F-BVI with evidence, on RGM.

Figure 3 plots the value of the free energy, i.e., the variational objective, versus time for BVI, L-BVI, and C2F-BVI. All three methods used the Adam optimizer with learning rate . The plot shows that C2F-BVI converges faster than L-BVI, which is in turn faster than BVI. Note that the sawtooth shape of C2F-BVI is a result of evidence splitting. This provides evidence that lifting and C2F do reduce the time cost of inference, answering Q2 and Q3 in the affirmative.

To assess the accuracy of the variational methods, we randomly chose to of the random variables and generated evidence values as in the previous task. We randomly generated five evidence settings and evaluated all variational methods with the same setup as above as well as EPBP with 20 sampling points and GaBP. All algorithms were run until convergence and compared against the ground truth. As Table 2 shows, on this simple unimodal model, all variational methods have very low error and low KL-Divergence, while EPBP has higher error introduced by the sampling procedure, providing evidence for Q1 that the variational approach is accurate in this case. In this unimodal case, NPVI appears to outperform BVI. We note that, as a general observation, NPVI tends to do better than BVI at estimating the mode of the distribution, but in multimodal settings, tends to result in a higher KL-divergence than BVI.

Relational Kalman Filtering

To further investigate Q1, we performed an experiment with Relational Kalman Filters (RKFs). A standard Kalman filter (KF) models the transition of a dynamic system with with noise , where denotes the transition matrix and

represents the observation matrix. A key assumption in KF is that the transition and noise follow a normal distribution, i.e.,

, and , for covariance matrices and . The RKF model defines a lifted KF, i.e. similar state variables and similar observation nodes share the same transition and observation model.

In this experiment, we use extracted groundwater level data from the Republican River Compact Association model [16]. The data set contains a record over 850 months of the water level data for 3,420 wells. We followed the same data preprocessing steps as in [4], where wells in the same area are grouped together and are assumed to share same transition and observation0 model. We test our algorithms on two different structure settings, a tree-structured model and a model with cycles. For the tree-structured model, we define the matrices , , , where , , and

is the identity matrix. For general model, we select

and , where is the matrix of ones, and other matrices are as before. We chose 20 months of record as the observation of the model. Note that the model defined has a linear Gaussian potential with the number of state variables as its dimension, which makes inference challenging. For simplicity, we expressed the model as a product pairwise potentials.

We compared our variational methods and EPBP against GaBP. The variational methods used Adam optimizer with learning rate ; EPBP used particle points. Table 3 reports the resulting avg difference and KL-divergence of the MAP estimate of the last time step nodes against GaBP. The variational methods are quite accurate in this model, owing mostly to the unimodality of the marginals, providing evidence in support of Q1 for this setting, even when the graph contains cycles. The variational methods also obtained better KL-divergence than EPBP, and the KL-divergence of all methods would likely further reduce with additional mixture components/particles.

Significant speed-ups are obtained by the lifted methods with a small improvement in the case of C2F. Note that the times are for the relative comparison of the lifting, and while they do indicate significant performance improvements from lifting, answering Q2

affirmatively, they should not be used to compare EPBP and BVI as no implementations were optimized for performance in this case. As the variational methods can also be efficiently implemented in Tensorflow, we also provide timing experiments in a high performance setting for BVI/NPVI and their lifted versions. For BVI/L-BVI, the TensorFlow implementation took

to run on the tree model and on the cycle model; NPVI/L-NPVI took and on the tree and cycle models respectively, and gave identical performance to BVI/L-BVI.

Algorithm Diff.(GaBP) time (s)

Tree

EPBP
BVI
L-BVI
C2F-BVI

Cycle

EPBP
BVI
L-BVI
C2F-BVI
Table 3: Accuracy of lifted inference methods for RKFs against GaBP.

Discussion

We presented distribution-independent, model-agnostic hybrid lifted inference algorithm that makes minimal assumptions on the underlying distribution. In addition, we presented a simple coarse-to-fine approach for handling symmetry breaking as a result of continuous evidence. We showed experimentally that the lifted and the coarse-to-fine variational methods compare favorably in terms of accuracy against exact and particle based methods for MAP and marginal inference tasks and can yield speed-ups over their non-lifted counterparts that range from moderate to significant, depending on the amount of evidence and the distribution of the evidence values. Finally, we provided a sufficient condition under which the BFE over the marginal polytope is bounded from below and showing that the BFE approximations yields a nontrivial approximation to the partition function in these cases.

References

  • [1] B. Ahmadi, K. Kersting, and S. Sanner (2011) Multi-evidence lifted message passing, with application to PageRank and the Kalman filter. In

    Proceedings of the 20th International Joint Conference on Artificial Intelligence

    ,
    Cited by: Introduction.
  • [2] H. H. Bui, T. N. Huynh, and D. Sontag (2014) Lifted tree-reweighted variational inference. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence (UAI), Cited by: Introduction, Lifted Probabilistic Inference via Color Passing, Lifting.
  • [3] J. Choi, E. Amir, and D. J. Hill (2010) Lifted inference for relational continuous models. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI), Cited by: Introduction, Experiments.
  • [4] J. Choi, E. Amir, T. Xu, and A. J. Valocchi (2015) Learning relational Kalman filtering. In Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), Cited by: Relational Kalman Filtering.
  • [5] J. Choi and E. Amir (2012) Lifted relational variational inference. In Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI), Cited by: Introduction.
  • [6] J. Choi, A. Guzman-Rivera, and E. Amir (2011) Lifted relational Kalman filtering. In Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI), Cited by: Introduction, Experiments.
  • [7] B. Cseke and T. Heskes (2011) Properties of Bethe free energies and message passing in Gaussian models. Journal of Artificial Intelligence Research, pp. 1–24. Cited by: Entropy Approximations, Relational Gaussian Model.
  • [8] N. Gallo and A. Ihler (2018) Lifted generalized dual decomposition. In Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), Cited by: Introduction, Coarse-to-Fine Lifting.
  • [9] S. J. Gershman, M. D. Hoffman, and D. M. Blei (2012) Nonparametric variational inference. In

    Proceedings of the 29th International Conference on Machine Learning (ICML)

    ,
    pp. 235–242. Cited by: Introduction, Entropy Approximations, Proposed Approaches.
  • [10] G. H. Golub and J. H. Welsch (1969) Calculation of Gauss quadrature rules. Mathematics of computation 23 (106), pp. 221–230. Cited by: Algorithm.
  • [11] H. Habeeb, A. Anand, M. Mausam, and P. Singla (2017)

    Coarse-to-fine lifted map inference in computer vision

    .
    In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), Cited by: Coarse-to-Fine Lifting.
  • [12] T. S. Jaakkola and M. I. Jordan (1998) Improving the mean field approximation via the use of mixture distributions. In Learning in graphical models, pp. 163–173. Cited by: Proposed Approaches.
  • [13] K. Kersting, B. Ahmadi, and S. Natarajan (2009) Counting belief propagation. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI), Cited by: Introduction, Lifted Probabilistic Inference via Color Passing.
  • [14] T. Lienart, Y. W. Teh, and A. Doucet (2015) Expectation particle belief propagation. In NeurIPS, Cited by: Experiments.
  • [15] Q. Liu, J. Lee, and M. Jordan (2016) A kernelized Stein discrepancy for goodness-of-fit tests. In International Conference on Machine Learning (ICML), pp. 276–284. Cited by: Algorithm.
  • [16] V. McKusick (2003) Final report for the special master with certificate of adoption of RRCA groundwater model. State of Kansas v. State of Nebraska and State of Colorado, in the Supreme Court of the United States. Cited by: Relational Kalman Filtering.
  • [17] M. Richardson and P. Domingos (2006) Markov logic networks. Machine Learning 62. Cited by: Experiments.
  • [18] N. Ruozzi (2012) The Bethe partition function of log-supermodular graphical models. In Neural Information Processing Systems (NeurIPS), Cited by: Entropy Approximations.
  • [19] N. Ruozzi (2013) Beyond log-supermodularity: lower bounds and the Bethe partition function. In Proceedings of the Twenty-Ninth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI), Cited by: Entropy Approximations.
  • [20] N. Ruozzi (2017) A lower bound on the partition function of attractive graphical models in the continuous case. In Artificial Intelligence and Statistics (AISTATS), Cited by: Entropy Approximations.
  • [21] P. Singla and P. Domingos (2008) Lifted first-order belief propagation. In Twenty-Second AAAI Conference on Artificial Intelligence (AAAI), Cited by: Introduction, Lifted Probabilistic Inference via Color Passing.
  • [22] J. Wang and P. Domingos (2008) Hybrid markov logic networks. In Twenty-Second AAAI Conference on Artificial Intelligence (AAAI), Cited by: Larger-scale Problems, Larger-scale Problems, Experiments.
  • [23] M. Welling and Y. W. Teh (2001) Belief optimization for binary networks: a stable alternative to loopy belief propagation. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI), Cited by: Entropy Approximations.
  • [24] J. S. Yedidia, W. T. Freeman, and Y. Weiss (2001) Bethe free energy, Kikuchi approximations, and belief propagation algorithms. Neural Information Processing Systems (NeurIPS). Cited by: Introduction, Introduction, Entropy Approximations.
  • [25] J. S. Yedidia, W. T. Freeman, and Y. Weiss (2005-07) Constructing free-energy approximations and generalized belief propagation algorithms. Information Theory, IEEE Transactions on 51 (7), pp. 2282 – 2312. Cited by: Introduction.