# Block-Value Symmetries in Probabilistic Graphical Models

Several lifted inference algorithms for probabilistic graphical models first merge symmetric states into a single cluster (orbit) and then use these for downstream inference, via variations of orbital MCMC [Niepert, 2012]. These orbits are represented compactly using permutations over variables, and variable-value (VV) pairs, but these can miss several state symmetries in a domain. We define the notion of permutations over block-value (BV) pairs, where a block is a set of variables. BV strictly generalizes VV symmetries, and can compute many more symmetries for increasing block sizes. To operationalize use of BV permutations in lifted inference, we describe 1) an algorithm to compute BV permutations given a block partition of the variables, 2) BV-MCMC, an extension of orbital MCMC that can sample from BV orbits, and 3) a heuristic to suggest good block partitions. Our experiments show that BV-MCMC can mix much faster compared to vanilla MCMC and orbital MCMC over VV permutations.

• 4 publications
• 9 publications
• 40 publications
• 20 publications
06/30/2016

### Contextual Symmetries in Probabilistic Graphical Models

An important approach for efficient inference in probabilistic graphical...
07/27/2017

### Non-Count Symmetries in Boolean & Multi-Valued Prob. Graphical Models

Lifted inference algorithms commonly exploit symmetries in a probabilist...
04/20/2015

### Partition MCMC for inference on acyclic digraphs

Acyclic digraphs are the underlying representation of Bayesian networks,...
12/01/2014

### Lifted Probabilistic Inference for Asymmetric Graphical Models

Lifted probabilistic inference algorithms have been successfully applied...
07/20/2020

### Automating Involutive MCMC using Probabilistic and Differentiable Programming

Involutive MCMC is a unifying mathematical construction for MCMC kernels...
06/10/2015

### Parallelizing MCMC with Random Partition Trees

The modern scale of data has brought new challenges to Bayesian inferenc...
11/06/2014

### Sublinear-Time Approximate MCMC Transitions for Probabilistic Programs

Probabilistic programming languages can simplify the development of mach...

## 1 Introduction

A lifted inference algorithm for probabilistic graphical models (PGMs) performs inference on a smaller model, which is constructed by merging together states (or variables) of the original model [Poole2003, de Salvo Braz et al.2005, Kimmig et al.2015]. Two main kinds of lifted inference algorithms exist: those where lifting is tied to an existing inference procedure such as belief propagation [Singla and Domingos2008, Kersting et al.2009], Gibbs sampling [Venugopal and Gogate2012], weighted model counting [Gogate and Domingos2011], variational inference [Bui et al.2013]

[Mladenov et al.2012]; and those that merge symmetric states/variables independent of the procedure [Niepert2012, Van den Broeck and Niepert2015, Anand et al.2016].

One approach for generating symmetries is by computing isomorphism over a graphical representation of the PGM. This merges symmetric states into a single cluster (orbit), which is compactly represented as permutations over a polynomial representation. Permutations over variables [Niepert2012] and over variable-value (VV) pairs [Anand et al.2017] have been studied, with latter being a generalization of the former, capturing many more state symmetries. While more general, VV permutations clearly do not capture all possible state symmetries in a domain. For example, state is symmetric to in Figure 1(b), but VV permutations cannot represent it.

A natural question arises: are there more general representations which can capture (a subset of) these larger set of symmetries? We note that the problem of computing all possible symmetries is intractable since there is an exponential number of permutations over an exponentially large state space, each of which could be a symmetry (or not). Nevertheless, we hope there are representations which can capture additional symmetries compared to current approaches in bounded polynomial time. More so, it would be interesting to come up with a representation that enables computation of larger and larger sets of symmetries, while paying additional costs, which could be controlled as a function of a parameter of the representation.

As a significant step toward this research question, we develop the novel notion of symmetries defined over block-value (BV) pairs. Here, a block is a set of variables, and its value is an assignment to these variables. Intuitively, BV pairs can capture all such VV pairs that are not permuted independently, instead, are permuted in subsets together. For example, it can capture symmetry of states and via a BV permutation which maps and .

Clearly, symmetries defined over BV pairs are a strict generalization of those over VV pairs, since each VV pair is a BV pair with a block of size 1. Our blocks can be of varying sizes and the size of each block essentially controls the set of symmetries that can be captured; larger the blocks, more the symmetries, coming at an additional cost (exponential in the max size of a block).

In this paper, we formally develop the notion of symmetries as permutations defined over a subset of BV pairs. Some of these permutations will be invalid (when blocks overlap with each other) and their application may lead to inconsistent state. In order to ensure valid permutations, we require that the blocks come from a disjoint set of blocks, referred to as a block partition. Given a block partition, we show how to compute the corresponding set of symmetries by reducing the problem to one of graph isomorphism. We also show that our BV symmetries can be thought of as VV symmetries, albeit over a transformed graphical model, where the new variables represent the blocks in the original graph.

Next, we show that jointly considering symmetries obtained from different block partitions can result in capturing symmetries not obtainable from any single one. Since, there is an exponential number of such block partitions, we provide an efficient heuristic for obtaining a promising partition of blocks, referred to as a candidate set.

Use of BV symmetries in an MCMC framework requires uniform sampling of a state from each orbit, i.e., a set of symmetric states. This turns out to be a non-trivial task when the orbits are defined over symmetries corresponding to different block partitions. In response, we design an aggregate Markov chain which samples from orbits corresponding to each (individual) candidate set in turn. We prove that our aggregate Markov chain converges to the desired distribution. As a proof of the utility of our BV symmetries, we show that their usage results in significantly faster mixing times on two different domains.

The outline of this paper is as follows. We start with some background on variable and VV symmetries in Section 2. This is followed by the exposition of our symmeteries defined over BV pairs (Section 3). Section 4 describes our algorithm for using BV symmetries in MCMC. This is followed by our heuristic to compute promising candidate sets in Section 5. We present our experimental evaluation (Section 6) and conclude the paper with directions for future work.

## 2 Background

Let

denote a set of discrete valued random variables. We will use the symbol

to denote the value taken by the variable . We will assume that each of the variables comes from the same domain . A state is an assignment to all the variables in the set . Further, gives the value of variable in state . We will use to denote the set of all possible states.

A Graphical Model [Koller and Friedman2009] is a set of pairs where is a feature function defined over the variables in the set and is its associated weight.

###### Definition 1.

Action of on results in a new graphical model where the occurrence of in each feature in is replaced by . Given a graphical model , a permutation of the variables in is said to be a variable symmetry of if the action of on results back in .

Given a state , the action of on , denoted by , results in a new state such that if and then .

The set of all variable symmetries forms a group called the variable automorphic group of and is denoted by . partitions the states into equivalence classes or orbits which are as defined below.

###### Definition 2.

Given a variable automorphic group , the orbit of a state under the effect of is defined as .

Intuitively, the orbit of a state is set of all states reachable from under the action of any permutation in the automorphic group.

We note that variable symmetries are probability preserving transformations

[Niepert2012]. Let denote the distribution defined by a graphical model where is the probability of a state .

###### Theorem 1.

If is a variable automorphic group of , then , , .

Anand et al. anand&al17 extend the notion of variable symmetries to those defined over variable value (VV) pairs. Let denote a VV pair and let denote the set of all possible such pairs. Let denote a permutation over the set . Action of on state , denoted by , results in a state , such that , if , then .

There are some VV permutations which when applied to a state may result in an inconsistent state. For instance, let , and , then results in an inconsistent state with multiple values being assigned to . Therefore, the notion of valid VV permutation needs to be defined which when applied to any state always results in a consistent state [Anand et al.2017].

###### Definition 3.

A VV permutation over is said to be a valid VV permutation if whenever there exists a VV pair such that , then for all the VV pairs of the form where , where .

###### Definition 4.

Action of on results in a new graphical model where the occurrence of in each feature in is replaced by . We say that is a VV symmetry of , if action of on results back in .

Similar to variable symmetries, the set of all VV symmetries form a group called the VV automorphic group of and is denoted by . Analogously, partitions the states into orbits defined as .

In the following, we will often refer to the automorphic groups and as symmetry groups of . It can be easily seen that VV symmetries subsume variable symmetries and like variable symmetries, they are also probability preserving transformations.

###### Theorem 2.

If is a VV automorphic group of , then , ,

The orbits so obtained through variable (VV) symmetries can then be exploited for faster mixing by Markov Chain Monte Carlo (MCMC) based methods as described below.

### 2.1 Orbital-MCMC

Markov Chain Monte Carlo (MCMC) methods [Koller and Friedman2009] are one of the popular algorithms for approximate inference in Probabilistic Graphical Models. Starting with a random state, these methods set up a Markov chain over the state space whose stationary distribution is same as the desired distribution. Convergence is guaranteed in the limit of a large number of samples coming from the Markov chain.

Orbital MCMC and VV-MCMC improve MCMC methods by exploiting Variable and VV symmetries, respectively. Given a Markov chain and a symmetry group , starting from a sample , any subsequent sample is obtained in 2 steps: a) An intermediate state is obtained according to b) The next sample is obtained by sampling a state uniformly from the orbit (Variable or VV) of the intermediate state . Sampling a state from the orbit of the intermediate state is done using the Product Replacement Algorithm [Celler et al.1995, Pak2000]. This two step chain so obtained converges to the true stationary distribution and has been shown to have better mixing both theoretically [Niepert2012] and empirically [Niepert2012, Anand et al.2017]. The key insight exploited by these algorithms is the fact that all the states in any given orbit have the same probability.

## 3 Block-Value Symmetries

In this section, we will present symmetries defined over blocks of variables, referred to as BV Symmetries which strictly generalize the earlier notions of symmetries defined over VV pairs. As a motivating example, Figure 1 shows two Graphical Models and . For ease of explanation these have been represented in terms of potential tables. These can easily be converted to the weighted feature representation, as defined previously. In , state has the same joint probability as and in , state has the same joint probability as . However, none of these can be captured by Variable or VV symmetries. We start with some definitions.

###### Definition 5.

Let denote a set of variables () which we will refer to as a block. Similarly, let denote a set of (corresponding) assignments to the variables in the block . Then, we refer to the pair as a Block-Value (BV) pair.

###### Definition 6.

A BV pair is said to be consistent with a state s if , where is the value for variable in block .

Let denote some subset of all possible BV pairs defined over blocks of size less than equal to . For ease of notation, we will drop superscript r and denote as where r is a pre-specified constant for maximum block size. Then, we are interested in defining permutations over the elements of the set . Considering any set of block-value pairs in and allowing permutation among them may lead to inconsistent states. Consider a graphical model defined over four variables: . Let us consider all possible blocks of size 2. Then, a BV permutation permuting the singleton block to itself (with identity mapping on values) while at the same time, permuting the block to the block is clearly inconsistent since ’s value can not be determined uniquely. A natural way to avoid this inconsistency is to restrict each variable to be a part of single block while applying permutations. Therefore, we restrict our attention to sets of blocks which are non overlapping.

###### Definition 7.

Let denote a set of blocks. We define to be a partition if each variable appears in exactly one block in . For a partition , we define the block value set as a set of BV pairs where each block is present with all of its possible assignments.

We would now like to define permutations over the block value set , which we refer to as BV-permutations. To begin, we define the action of a BV-permutation on a state s. The action of a BV-permutation on a state results in a state such that , is consistent with if and only if is consistent with

However, similar to the case of VV symmetries, any bijection from may not always result in a consistent state. For instance, consider a graphical model with 4 variables. Let the partition . Consider the state . In case is defined as = and = , the action of results in an inconsistent state, since the action of would result in a state with equal to both 0 and 1 simultaneously. To address this issue, we define a BV-permutation to be valid only under certain conditions.

###### Definition 8.

A BV-permutation is said to be valid if such that

Intuitively a BV-permutation is valid if it maps all assignments of a block to assignments of a fixed block .

Presently, it is tempting to define a new graphical model where each block is a multi valued variable, with domain of this variable describing all of the possible assignments. This would be useful in a lucid exposition of symmetries. To do this we must suitably transform the set of features as well to this new set of variables. Given a block partition , we transform the set of features such that for each block either all the variables in this block appear in the feature or none of them appear in the feature, while keeping all features logically invariant. We denote the set of all variables over which feature is defined as . Further, for a block and a feature , let i.e contains the additional variables in the block which are not part of feature .

###### Definition 9.

Given a variable , which appears in a block and a feature , a block consistent representation of the feature, denoted by , is defined over the variables , such that, where , denote an assignment to all the variables in and , respectively.

For instance consider the feature . Let the block be . Then the block consistent feature is given by .

We extend the idea of block consistent representation to get a partition consistent representation .

###### Definition 10.

A partition consistent representation of a feature , is defined by iteratively converting the feature to its block consistent representation for each .

The set of partition consistent features has the property that for all , or , i.e. all variables in each block either appear completely, or do not appear at all in any given feature. This property allows us to define a transformed graphical model over a set of multi valued variables , where each variable represents a block . The domain size of is the number of possible assignments of the variables in the block . The set of features in this new model is simply the set of transformed features . As the blocks are non overlapping, such a transformation can always be carried out.

Since the transformation of features to partition consistent features always preserves logical equivalence, it seems natural to wonder about the relationship between the graphical models and . We first note that each state in can be mapped to a unique state in by simply iterating over all the blocks , checking which BV pair is consistent with the state and assigning the appropriate value to the corresponding variable . In a similar manner, each state can be mapped to a unique state in .

###### Theorem 3.

Let denote a state in and let be the corresponding state in . Then, this correspondence is probability preserving i.e., where and are the distributions defined by and , respectively.

Similar to the mapping between states, every BV-permutation of corresponds to an equivalent VV-permutation of obtained by replacing each BV pair in by the corresponding VV pair in (and vice-versa). Since the distributions defined by the two graphical models are equivalent, we can define BV symmetries in as follows:

###### Definition 11.

Under a given partition , a BV-permutation of a graphical model is a BV-symmetry of if the corresponding permutation under is a VV-symmetry of .

We can now state the following results for BV-symmetries.

###### Theorem 4.

BV-symmetries are probability preserving transformations, i.e., for a BV-symmetry , for all states .

It is easy to that the set of all BV symmetries under a given partition form a group . Similar to the VV orbits, we define the BV orbit of a state as .

When the partition is such that each variable appears in a block by itself, all the BV-symmetries are nothing but VV-symmetries.

###### Theorem 5.

Any VV-symmetry can be represented as a BV-symmetry for an appropriate choice of .

Computing BV Symmetries

Since BV symmetry on a graphical model is defined in terms of VV symmetry of a transformed graphical model , BV symmetry can be trivially computed by constructing the transformed graphical model and then computing VV symmetry on as described by Anand et al. anand&al17.

## 4 Aggregate Orbital Markov Chains

Given a block partition , BV symmetry group of can be found by computing VV symmetry group in the auxiliary graphical model . We further setup a Markov chain BV-MCMC() over to exploit BV symmetries where is a parameter.

###### Definition 12.

Given a graphical model , a Markov chain and a BV symmetry group , one can define a BV-MCMC() Markov chain as follows: From the current sample
a) Sample a state from original Markov chain
b) i) With probability , sample a state uniformly from BV orbit of and return as next sample.
ii) With probability , set state and return it as the next sample

BV-MCMC() Markov chain is defined similar to VV-MCMC except that it takes an orbital move only with probability instead of taking it always. For , it is similar to VV-MCMC, and reduces to the original Markov chain for . When , sometimes, it is observed that the gain due to symmetries is overshadowed by the computational overhead of the orbital step. The parameter captures a compromise between these two contradictory effects.

###### Theorem 6.

Given a Graphical Model , if the original Markov chain is regular, then, BV-MCMC() Markov chain , constructed as above, is regular and converges to the unique stationary distribution of the original Markov chain .

It should be noted that two different block partitions may capture different BV symmetries and hence may have different BV symmetry groups. In order to fully utilize all symmetries which may be present in multiple block partitions, we propose the idea of Aggregate Orbital Markov Chain.

Consider different block partitions . We set up independent BV-MCMC() Markov chains, where each chain generates samples as per BV-MCMC() corresponding to partition . Let these chains be , and let the corresponding automorphism groups be . Given an intermediate state , we would like to sample uniformly from the union of orbits . Since these orbits may overlap with each other, sampling a state uniformly from the union of orbits is unclear. We circumvent this problem by setting up a new Markov chain, Aggregate Orbital Markov Chain. This Aggregate Orbital Markov Chain utilizes all available symmetries and converges to the true stationary distribution.

###### Definition 13.

Given different BV-MCMC() Markov chains, , an Aggregate Orbital Markov Chain can be constructed in the following way: Starting from state a) Sample a BV-MCMC() Markov chain uniformly from b) Sample a state according to .

###### Theorem 7.

The aggregate orbital Markov chain constructed from BV-MCMC() Markov chains, , all of which have stationary distribution , is regular and converges to the same stationary distribution .

###### Proof.

Given each of BV-MCMC() Markov chains are regular, firstly, we prove that the aggregate Markov chain is regular. In each step of aggregate chain, one of the BV-MCMC() is applied and since, there is non-zero probability of returning to the same state in BV-MCMC() chain, there is non-zero probability of returning to the same state in . Hence, aggregate chain so defined is regular and therefore, it converges to a unique stationary distribution. [Koller and Friedman2009].
The only fact that remains to be shown is that the stationary distribution of is . Let represent the transition probability of going from state to in aggregate chain . We need to show that

 π(s′)=∑s∈Sπ(s)∗T∗(s→s′) (1)

Let represent the transition probability of going from state to in

 ∑s∈Sπ(s)∗T∗(s→s′)=∑s∈Sπ(s)∗1K∗K∑k=1Tk(s→s′) (2)
 =1KK∑k=1∑s∈Sπ(s)∗Tk(s→s′)=1KK∑k=1π(s′)=π(s′) (3)

Equation 2 follows from the definition of aggregate chain while equation 3 holds since converges to stationary distribution .

Aggregate Markov chain so obtained not only converges to the correct stationary distribution but also results in faster mixing since it can exploit the symmetries associated with each of the individual orbital Markov chains.

## 5 Heuristics for Block Partitions

We have so far computed BV symmetries given a specific block partition. We now discuss our heuristic that suggests candidate block partitions for downstream symmetry computation (see supplementary material for pseudo-code). At a high level, our heuristic has the following two desiderata. Firstly, it ensures that there are no overlapping blocks, i.e., one variable is always in one block. Secondly, it guesses which blocks might exhibit BV-symmetries, and encourages such blocks in a partition.

The heuristic takes the hyperparameter

, the maximum size of a block, as an input. It considers only those blocks (upto size ) in which for each variable in the block, there exists at least one other variable from the same block, such that some clause in contains both of them. This prunes away blocks in which variables do not directly interact with each other, and thus are unlikely to produce symmetries. Note that these candidate blocks can have overlapping variables and hence not all can be included in a block partition.

For these candidate blocks, for each block-value pair, the heuristic computes a weight signature. The weight signature is computed by multiplying weights of all the clauses that are made true by the specific block-value assignment. The heuristic then buckets all BV pairs of the same size based on their weight signatures. The cardinality of each bucket (i.e., the number of BV pairs of the same size that have the same weight signature) is calculated and stored.

The heuristic samples a block partition as follows. At each step it samples a bucket with probability proportional to its cardinality and once a bucket is selected, then it samples a block from that bucket uniformly at random, as long as the sampled block doesn’t conflict with existing blocks in the current partition i.e., it has no variables in common with them. This process is repeated until all variables are included in the partition. In the degenerate case, if a variable can’t be sampled from any block of size 2 or higher, then it gets sampled as an independent block of size 1. Once a partition is fully sampled, it is stored and the process is reset to generate another random block partition.

This heuristic encourages sampling of blocks that are part of a larger bucket in the hope that multiple blocks from the same bucket will likely yield BV symmetries in the downstream computation. At the same time, the non-conflicting condition and existence of single variable blocks jointly ensure that each sample is indeed a bona fide block partition.

## 6 Experiments

Our experiments attempt to answer two key research questions. (1) Are there realistic domains where BV symmetries exist but VV symmetries do not? (2) For such domains, how much faster can an MCMC chain mix when using BV symmetries compared to when using VV symmetries or not using any symmetries?

### 6.1 Domains

To answer the first question, we construct two domains. The first domain models the effect of an academic course on an individual’s employability, whereas the second domain models the choices a student makes in completing their course credits. Both domains additionally model the effect of one’s social network in these settings. Table 1 specifies the weighted first order formulas for both the domains.

Job Search: In this domain, there are

people on a social network, looking for a job. Given the AI hype these days, their employability is directly linked with whether they have learned machine learning (ML) or not. Each person

has an option of taking the ML course, which is denoted by . Furthermore, the variable denotes whether two people and are connected in the social network or not. Finally, the variable denotes whether gets employment or not.

In this Markov Logic Network (MLN)[Domingos and Lowd2009], each person participates in three kinds of formulas. The first one with weight indicates the (unnormalized) probability of the person getting a job and taking the ML course (). The second formula with weight indicates the chance of the person getting a job while not taking the course (). Our domain assigns different weights and for each person, modeling the fact that each person may have a different capacity to learn ML, and that other factors may also determine whether they get a job or not. Finally, is more likely to take the course if their friends take the course. This is modeled by an additional formula for each pair , with a fixed weight .

In this domain, there are hardly any VV symmetries, since every will likely have different weights. However there are intra-block BV symmetries for the block () for every . This is because within the potential table of this block the block values (0, 0) and (1, 0) are symmetric and can be permuted.

Student Curriculum: In this domain, there are students who need to register for two courses, one from Mathematics and one from Computer Science to complete their course credits. There are two courses (basic or advanced) on offer in both disciplines. Variables and denote whether the student would take the advanced course in each discipline. Since courses for Mathematics and CS could be related, each student needs to give a joint preference amongst the 4 available options. This is modeled as a potential table over () with weights chosen randomly from a fixed set of parameters. Further, some students may also be friends. Since students are more likely to register in courses with their friends, we model this as an additional formula, which increases the probability of registering for a course in case a friend registers for the same.

In this domain, VV pairs can only capture symmetries when the potential tables (over and ) for two students are exactly the same. However, there are a lot more inter-block BV symmetries since it is more likely to find pairs of students, whose potential tables use the same set of weights, but in a different order.

### 6.2 Comparison of MCMC Convergence

We now answer our second research question by comparing the convergence of three Markov chains – Vanilla-MCMC, VV-MCMC, and BV-MCMC(). All three use Gibbs sampling as the base MCMC chain. All experiments are done on Intel Core i7 machines. Following previous work, and for fair comparison, we implement all the three Markov chains in group theoretic package - GAP [GAP2015]. This allows the use of off-the-shelf group theoretic operations. The code for generating candidate lists is written in C++. We solve graph isomorphism problems using the Saucy software [Darga et al.2008]. We release our implementation for future use by the community .

In all experiments, we keep the maximum block size in a block partition to be two. For each chain we plot the KL divergence of true marginals and computed marginals for different runtimes. We estimate true marginals by running the Gibbs sampling algorithm for a sufficiently long period of time. Each algorithm is run 20 times to compute error bars indicating 95% confidence interval.

For VV-MCMC and BV-MCMC, the run time on x-axis includes the pre-processing time of computing symmetries as well. For BV-MCMC, this includes the time for generating candidate lists, running Saucy for each candidate list, and initializing the Product Replacement algorithm for each candidate lists. The total preprocessing time for Job Search domain is around 1.6 sec and for Student Curriculum domain is around 0.6 sec.

Figures 8 shows that BV-MCMC substantially outperforms VV-MCMC and Vanilla-MCMC in both the domains. The parameter is set to 1.0 for Job Search Domain and 0.02 for Student Curriculum Domain. Since these domains do not have many VV-Symmetries, VV-MCMC only marginally outperforms Vanilla MCMC. On the other hand BV-MCMC is able to exploit a considerably larger number of symmetries and leads to faster mixing. BV-MCMC scales well with domain size, significantly outperforming other algorithms as domain size is changed from 30 to 50 people in Job Search and 600 to 1200 in Student Curriculum domain. This is particularly due to more symmetries being captured by BV-MCMC for larger domain sizes. 222Most of the error-bars are negligible in size.

Figure 8(c) and 8(f) plot the variation with introduction of 10% evidence in each domain. BV MCMC still outperforms VV-MCMC and Vanilla-MCMC and is robust to presence of evidence.

Finally, we also test the sensitivity of BV-MCMC with the parameter. Figure 9 plots this variation on both these domains. We find that for Job Search, a high value performs the best, whereas a lower value is better in Student Curriculum. This is because Job Search mostly has intra-block BV symmetries, which can be computed and applied efficiently. This makes sampling an orbital step rather efficient. On the other hand, for Student Curriculum, the inter-block symmetry between different pairs of people makes the orbital step costlier, and reducing the fraction of times an orbital move is taken improves the overall performance.

## 7 Conclusions

Permutations defined over variables or variable-value (VV) pairs miss a significant fraction of state symmetries. We define permutations over block-value (BV) pairs, which enable a subset of variables (block) and their assignment to jointly permute to another subset. This representation is exponential in the size of the maximum block , but captures more and more state symmetries with increasing .

Novel challenges arise when building the framework and algorithms for BV permutations. First, we recognize that all BV permutations do not lead to valid state symmetries. For soundness, we impose a sufficient condition that each BV permutation must be defined on blocks with non-overlapping variables. Second, to compute BV symmetries, we describe a graph-isomorphism based solution. But, this solution expects a block partition as an input, and we cannot run it over all possible block partitions as they are exponential in number. In response, we provide a heuristic that outputs candidate block partitions, which will likely lead to BV symmetries. Finally, since the orbits from different block partitions may have overlapping variables, they cannot be explicitly composed in compact form. This makes it difficult to uniformly sample from the aggregate orbit (aggregated over all block partitions). To solve this challenge, we modify the Orbital MCMC algorithm so that in the orbital step, it uniformly samples from the orbit from any one of the block partitions (BV-MCMC). We prove that this aggregate Markov chain also converges to the true posterior.

Our experiments show that there exist domains in which BV symmetries exist but VV symmetries may not. We find that BV-MCMC mixes much more rapidly than base MCMC or VV-MCMC, due to the additional mixing from orbital BV moves. Overall, our work provides a unified representation for existing research on permutation groups for state symmetries. In the future, we wish to extend this notion to approximate symmetries, so that they can be helpful in many more realistic domains as done in earlier works [Habeeb et al.2017].

## Acknowledgements

We thank anonymous reviewers for their comments and suggestions and Happy Mittal for useful discussions. Ankit Anand is supported by the TCS Fellowship. Mausam is supported by grants from Google and Bloomberg. Parag Singla is supported by the DARPA Explainable Artificial Intelligence (XAI) Program with number N66001-17-2-4032. Both Mausam and Parag Singla are supported by the Visvesvaraya Young Faculty Fellowships by Govt. of India and IBM SUR awards. Any opinions, findings, conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views or official policies, either expressed or implied, of the funding agencies.

## References

• [Anand et al.2016] A. Anand, A. Grover, Mausam, and P. Singla. Contextual Symmetries in Probabilistic Graphical Models. In IJCAI, 2016.
• [Anand et al.2017] A. Anand, R. Noothigattu, P. Singla, and Mausam. Non-Count Symmetries in Boolean & Multi-Valued Prob. Graphical Models. In AISTATS, 2017.
• [Bui et al.2013] H. Bui, T. Huynh, and S. Riedel. Automorphism groups of graphical models and lifted variational inference. In UAI, 2013.
• [Celler et al.1995] F. Celler, C. R. Leedham-Green, S. H. Murray, A. C Niemeyer, and E. A O’brien. Generating random elements of a finite group. Communications in algebra, 23(13):4931–4948, 1995.
• [Darga et al.2008] P. T. Darga, K. A. Sakallah, and I. L. Markov. Faster Symmetry Discovery using Sparsity of Symmetries. In Design Automation Conference, 2008.
• [de Salvo Braz et al.2005] R. de Salvo Braz, E. Amir, and D. Roth. Lifted First-Order Probabilistic Inference. In IJCAI, 2005.
• [Domingos and Lowd2009] P. Domingos and D. Lowd. Markov Logic: An Interface Layer for Artificial Intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2009.
• [GAP2015] The GAP Group. GAP – Groups, Algorithms, and Programming, Version 4.7.9, 2015.
• [Gogate and Domingos2011] V. Gogate and P. Domingos. Probabilisitic Theorem Proving. In UAI, 2011.
• [Habeeb et al.2017] Haroun Habeeb, Ankit Anand, Mausam Mausam, and Parag Singla.

Coarse-to-fine Lifted MAP Inference in Computer Vision.

In IJCAI, 2017.
• [Kersting et al.2009] K. Kersting, B. Ahmadi, and S. Natarajan. Counting Belief Propagation. In UAI, 2009.
• [Kimmig et al.2015] A. Kimmig, L. Mihalkova, and L. Getoor. Lifted Graphical Models: A Survey. Machine Learning, 99(1):1–45, 2015.
• [Koller and Friedman2009] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
• [Madan et al.2018] G. Madan, A. Anand, Mausam, and P. Singla. Block Value Symmetries in Probabilistic Graphical Models. In UAI, 2018.
• [Mladenov et al.2012] M. Mladenov, B. Ahmadi, and K. Kersting. Lifted Linear Programming. In AISTATS, 2012.
• [Niepert and den Broeck2014] Mathias Niepert and Guy Van den Broeck. Tractability through Exchangeability: A New Perspective on Efficient Probabilistic Inference. In AAAI, 2014.
• [Niepert2012] M. Niepert. Markov Chains on Orbits of Permutation Groups. In UAI, 2012.
• [Pak2000] I. Pak. The Product Replacement Algorithm is Polynomial. In Foundations of Computer Science, 2000.
• [Poole2003] D. Poole. First-Order Probabilistic Inference. In IJCAI, 2003.
• [Singla and Domingos2008] P. Singla and P. Domingos. Lifted First-Order Belief Propagation. In AAAI, 2008.
• [Van den Broeck and Niepert2015] G. Van den Broeck and M. Niepert. Lifted Probabilistic Inference for Asymmetric Graphical Models. In AAAI, 2015.
• [Venugopal and Gogate2012] D. Venugopal and V. Gogate. On Lifting the Gibbs Sampling Algorithm. In NIPS, 2012.

## Algorithmic and Implementation Details for Finding Block Partitions

This section provides algorithmic and implementation details for the heuristic used to find the candidate set of block partitions (Section 5). There are three broad steps for obtaining a good candidate set and each one of them is described below in turn.

Algorithm 1: Procedure takes a parameter and computes potentially useful blocks with maximum block-size . It iterates over each of the features in turn and selects all possible subsets of size of variables which are part of that feature (lines 2-7). This automatically eliminates all (or less) sized blocks which are composed of variables that never appear together in any feature in the graphical model.

Algorithm 2: For the useful blocks obtained above, our heuristic constructs a weight signature for each of the block-value pairs. Procedure computes a weight signature for all the features consistent with the input BV pair(). We define the Feature Blanket of a variable as the set of features in which appears. In line 1, we construct feature blanket of a block by taking union of the feature blankets of all the variables appearing in the block. Line 2 initializes the signature as an empty multi-set. We construct weight signature by iterating over features present in feature blanket of this block. For each feature , we check whether the given BV pair () is consistent with , i.e., whether the feature is satisfied by the block-value pair. The weight of is inserted in the signature if the consistency requirement is met (line 5). The complete weight-signature so obtained after iterating over all the features in the blanket is returned as the weight-signature for the BV pair ().

Algorithm 3: This makes use of the two procedures described above and outlines the complete process for generating multiple block partitions. It takes as input a Graphical Model and maximum block-size . After obtaining useful blocks, a weight signature dictionary is constructed with key as weight-signature and value as a list of blocks. For each block , we iterate over all value assignments of that block () to form all possible BV pairs (lines 3,4). For each BV pair, Procedure computes the weight-signature for that BV pair (line 5). If has already been seen in dictionary, the current block is appended to the list of blocks corresponding to the signature (lines 6,7). Else, a new weight-signature along with the list of singleton block is added to the dictionary (lines 8-10).

Once the weight-signature dictionary is built, we generate useful candidate lists by picking blocks using the weight signature dictionary (loop at line 14). Line 15 initializes an empty candidate list. Blocks are added to the candidate list in iterative fashion until all the variables are included (line 16). A two step sampling procedure is used. The first step samples a weight-signature with a probability proportional to the size of its corresponding list of blocks (line 17). The second step samples a block uniformly from the list of blocks sampled in the first step. The sampled block is added to the current candidate list if it does not overlap with pre-existing blocks (lines 19-21) otherwise a new block is sampled as above. Once all variables are added, the candidate list is complete and the process is run again till number of lists are generated.