Probabilistic Graphical Models and Tensor Networks: A Hybrid Framework

06/29/2021 ∙ by Jacob Miller, et al. ∙ Princeton University 0

We investigate a correspondence between two formalisms for discrete probabilistic modeling: probabilistic graphical models (PGMs) and tensor networks (TNs), a powerful modeling framework for simulating complex quantum systems. The graphical calculus of PGMs and TNs exhibits many similarities, with discrete undirected graphical models (UGMs) being a special case of TNs. However, more general probabilistic TN models such as Born machines (BMs) employ complex-valued hidden states to produce novel forms of correlation among the probabilities. While representing a new modeling resource for capturing structure in discrete probability distributions, this behavior also renders the direct application of standard PGM tools impossible. We aim to bridge this gap by introducing a hybrid PGM-TN formalism that integrates quantum-like correlations into PGM models in a principled manner, using the physically-motivated concept of decoherence. We first prove that applying decoherence to the entirety of a BM model converts it into a discrete UGM, and conversely, that any subgraph of a discrete UGM can be represented as a decohered BM. This method allows a broad family of probabilistic TN models to be encoded as partially decohered BMs, a fact we leverage to combine the representational strengths of both model families. We experimentally verify the performance of such hybrid models in a sequential modeling task, and identify promising uses of our method within the context of existing applications of graphical models.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Probabilistic graphical models (PGMs) are a framework for encoding conditional independence information about multivariate distributions as graph-based representations, whose generality and interpretability have made them an indispensable tool for probabilistic modeling. Undirected graphical models (UGMs), also known as Markov random fields, form a general class of PGMs with a diverse range of applications in fields such as computer vision 

wang2013markov

, natural language processing 

wang2019bert , and biology mora2011biological . More recently, the graphical structure of discrete UGMs has been shown to be closely related to that of tensor networks (TNs) duality2018 , a state-of-the-art modeling framework first developed for quantum many-body physics verstraete2008matrix ; Orus2014

, whose use in machine learning—for example in model compression 

novikov2015tensorizing ; cichocki2016tensor , proving separations in expressivity between deep and shallow learning methods cohen2016expressive ; levine2018deep , and as standalone learning models milesdavid ; novikov2017exponential —has been a subject of growing interest.

In this work, we explore the correspondence between UGMs and TNs in the setting of probabilistic modeling. Whereas UGMs are specifically designed to represent probability distributions, general TNs represent high-dimensional tensors whose values can be positive, negative, or even complex. While restricting TN parameters to take on non-negative values results in an exact equivalence with UGMs duality2018 , it also limits their expressivity. More general probabilistic models built from TNs, as exemplified by the Born machine (BM) bornmachine2018 model family, employ complex latent states that permit them to utilize novel forms of interference phenomena in structuring their learned distributions. While this provides a new resource for probabilistic modeling, it also limits the applicability of foundational PGM concepts such as conditional independence.

We make use of the physics-inspired concept of decoherence zurek2003decoherence to develop a hybrid framework for probabilistic modeling, which allows for the coexistence of tools and concepts from UGMs alongside quantum-like interference behavior. We use this framework to define the family of decohered Born machines

(DBMs), which we prove is sufficiently expressive to reproduce any probability distribution expressible by discrete UGMs or BMs, along with more general families of TN-based models. We further show that DBMs satisfy a conditional independence property relative to its decohered regions, with the operation of decoherence permitting the values of latent random variables to be conditioned on in an identical manner as UGMs. Finally, we verify the empirical benefits of such models on a sequential modeling task.

Related Work

Our work builds on the duality results of duality2018 , which establish a graphical correspondence between discrete UGMs and TNs, by further accounting for the distinct probabilistic behavior of both model classes. Much work across physics, machine learning, stochastic modeling, and automata theory has introduced and explored novel properties of quantum-inspired probabilistic models zhao2010norm ; bailly2011quadratic ; FV2012 ; gao2017efficient ; pestun2017tensor ; pestun2017language ; bornmachine2018 ; stoudenmire2018learning ; stokes_terilla ; benedetti2019generative ; BST_2020 ; miller2021tensor ; gao2021enhancing , almost all of which explicitly or implicitly employ tensor networks. The relative expressivity of these models was explored in glasser2019 ; adhikary2021quantum , where quantum-inspired models were proven to be inequivalent to graphical models. Fully-quantum generalizations of various graphical models were investigated in leifer2008quantum .

2 Preliminaries

We work with real and complex finite-dimensional vector spaces

, where denotes one of or when the distinction is not needed. We take an th order tensor, or -tensor, over to be a scalar-valued map from an -fold Cartesian products of index sets, where and where the vector space of all -tensors is denoted by . Matrices, vectors, and scalars over respectively correspond to -tensors, -tensors, and -tensors, whereas higher-order tensors refers to any -tensor for . The elements of are individual values of on input tuples, and written as , while the th mode of refers to the th argument of . The contraction of a vector with the th mode of an tensor is the -tensor whose elements satisfy . Although dense representations of -tensors require parameters to specify, where , we will see later how tensor networks bypass this exponential scaling for many families of higher-order tensors. As one simple example, the tensor product of any -tensor and -tensor is the -tensor whose elements are given by . We use to indicate the non-negative real numbers, and take the 2-norm of a tensor to be the scalar . Finally, we use to indicate the conjugate transpose of a complex vector or matrix .

We focus exclusively on undirected graphs , whose vertex and edge sets are denoted by and . In anticipating the needs of tensor networks, we allow the existence of edges which are incident to only one node, which we refer to as visible (i.e. dangling) edges. We use to indicate the set of all visible edges, and to indicate the set of all hidden edges, which are edges adjacent to two nodes. Graphs without dangling edges will be called proper graphs. For any node , we denote the set of edges incident to by . A clique of is a maximal subset such that every pair of nodes are connected by an edge, and we use to denote the set of all cliques of . We define a cut set of to be any set of edges such that the removal of all edges in from partitions the graph into two disjoint non-empty sub-graphs.

We work with random variables (RVs), indicated by uppercase letters such as , and their possible outcomes, indicated by lowercase equivalents such as . RVs and their outcomes are frequently indexed with values chosen from an index set, for example , in which case the notation indicates the joint RV . A similar notation is used for multivariate functions , as well as for tensor elements , and a related notation is used for spaces of tensors. Given three disjoint sets of random variables , we use to indicate the conditional independence of and given , and to indicate the (unconditional) independence of and .

2.1 Undirected Graphical Models

Probabilistic graphical models (PGMs) represent multivariate probability distributions using a proper graph whose nodes each correspond to distinct RVs . We focus on undirected graphical models (UGMs), whose probability distributions are determined by a collection of clique potentials , non-negative valued functions from the RVs associated with nodes in , where ranges over all cliques of . Given a UGM with clique potentials defined on a graph with nodes, the probability distribution represented by the UGM is

(1)

For brevity, we will often omit normalization factors such as in the following, with the understanding that such terms must ultimately be added to ensure a valid probability distribution. UGMs satisfy an intuitive conditional independence property involving disjoint subsets of nodes for which the removal of leaves the nodes of and in separate disconnected subgraphs of . In this case, the RVs associated with these nodes satisfy .

While the definition above is the standard presentation of UGMs, to permit an easier comparison with tensor networks we will more frequently view them using a dual graphical formulation. In this dual picture, nodes represent clique potentials and edges represent RVs .

3 Tensor Networks

Tensor networks (TNs) provide a general means to efficiently represent higher-order tensors in terms of smaller tensor cores, in much the same way as UGMs efficiently represent multivariate probability distributions in terms of smaller clique potentials. Tensor contraction is crucial in the structure of TNs, and generally involves the multiplication of an -tensor and an -tensor along modes of equal dimension , to yield a single output -tensor whose elements are given by

(2)

Although appearing complex in its general form, it is worth verifying from Equation (2) that tensor contraction generalizes matrix-vector and matrix-matrix multiplication, along with vector inner product and scalar multiplication. Tensor contraction is associative, in the sense that multiple contractions between multiple tensors yield the same output regardless of the order of contraction, with different contraction orderings often having vastly different memory and compute requirements.

Tensor network diagrams penrose71 provide an intuitive formalism for reasoning about computations involving tensor contraction using undirected graphs. In a TN diagram, each -tensor is represented as a node of degree , and each mode of is represented as an edge incident to . Tensor contraction between two tensors along a pair of modes is depicted by connecting the corresponding edges of the nodes, with the actual operation of tensor contraction depicted by merging the nodes representing both input tensors into a single node which shares the visible edges of both input nodes. In this manner, a TN diagram with visible edges and any number of hidden edges specifies a sequence of tensor contractions whose output will always be an -tensor. For example, the TN diagram

=

expresses a tensor contraction used in the SVD to express a matrix as the product of three smaller matrices. As a degenerate case, the tensor product of two tensors is depicted by drawing them adjacent to each other, with no connected edges.

Figure 1: Tensor network and copy tensor notations. (a) Basic notations for TNs, where nodes represent tensors and edges represent tensor modes. Vectors denote elements of an orthonormal basis used to express tensors as arrays. (b–c) Vector inner products and matrix multiplication are simple examples of tensor contraction. (d) Tensor contraction is associative, with the contraction of tensors , , and by first contracting and (shown) giving the same result as first contracting and . (e) Copy tensors are denoted by a black dot with edges. (f) Copy tensors act on basis vectors by copying them to all visible edges and (g) they permit any connected network of copy tensors to be arbitrarily rearranged, provided the total number of visible edges remain unchanged. Copy tensors also allow the graphical expression of basis-dependent operations, including (h) marginalizing over a RV in a probability distribution, (i) the element-wise product of tensors, and (j) the creation of diagonal matrices from a vector of diagonal values.

Tensor networks use a fixed TN diagram to efficiently parameterize a family of higher-order tensors in terms of a family of smaller dense tensor cores, as stated in the following:

Definition 1.

A tensor network consists of a graph , along with a positive integer valued map assigning each edge to a vector space of dimension , and a map assigning each node to a tensor core whose shape is determined by the dimensions assigned to edges incident to . The tensor represented by a tensor network with nodes is that resulting from a contraction of all tensor cores according to the tensor network diagram defined by .

Dimensions assigned to hidden edges are referred to as bond dimensions, and for a fixed graph

they represent the primary hyperparameters setting the tradeoff between a TN’s compute/memory efficiency and its expressivity. A simple but illustrative example of a TN is a low-rank matrix factorization, whose graph

is the line graph on two nodes

, and whose single hidden edge is associated with a bond dimension equal to the rank of the parameterized matrices.

Copy Tensors

A simple family of tensors plays a key role in understanding the relationship between UGMs and TNs. Given an orthonormal basis for a vector space , for each we define the th order copy tensor associated with to be . When is contracted with any of the basis vectors , the result is a tensor product of independent copies of (Figure 1f). This convenient property only holds for vectors chosen from the basis defining the copy tensor, leading to a one-to-one correspondence between copy tensor families and orthonormal bases coecke_pavlovic_vicary_2013 . The copy tensors and respectively correspond to the

-dimensional all-ones vector and identity matrix, with the former allowing the expression of sums over tensor elements.

A general copy tensor is depicted graphically as a single black dot with edges (Figure 1e), while is depicted as an undecorated edge. These satisfy a useful closure property under tensor contraction, whereby any connected network of copy tensors is identical to a single copy tensor with the same number of visible edges (fong2019invitation, , Theorem 6.45). This property allows connected networks of copy tensors to be rearranged in any manner, so long as the number of visible edges remains unchanged (Figure 1g). While general tensor network diagrams remain unaffected by changes of basis in the hidden edges, the use of copy tensors permits the graphical depiction of a larger family of basis-dependent linear algebraic operations (Figure 1h–j).

3.1 Undirected Graphical Models as Non-Negative Tensor Networks

Figure 2: Tensor network description of UGMs. (a) Example of the duality between UGM-style graphical notation, where nodes are associated with RVs, and TN-style graphical notation, where nodes are associated with clique potentials. Factor graphs act as an intermediate representation, with duality simply interchanging variable and factor nodes. (b) Marginalization of a probabilistic model is represented in TN notation by contracting the corresponding visible edges by first-order copy tensors. (c) Conditioning is represented by contracting the corresponding visible edges by outcome-dependent basis vectors, with conditional independence of the resulting distribution achieved through properties of copy tensors. (d) Any TN with non-negative cores can be converted to a UGM by promoting its hidden edges into visible edges by the use of third-order copy tensors. Marginalizing these new latent RVs recovers the original distribution over the visible edges.

Discrete multivariate probability distributions are an important example of higher-order tensors, with the individual probabilities of a distribution over discrete RVs forming the elements of an -tensor. More generally any non-negative tensor , whose elements all satisfy , can be converted into a probability distribution by normalizing as , where .

The connection between multivariate probability distributions and the structure of higher-order tensors extends further, with the independence relation between two disjoint sets of RVs being equivalent to the factorization of their joint probability distribution as the tensor product . Methods used to efficiently represent and learn higher-order tensors, such as tensor networks, can be applied to probabilistic modeling, provided that some means of parameterizing only non-negative tensors is employed. We discuss two important approaches for achieving this probabilistic parameterization, one equivalent to undirected graphical models and the other to Born machines.

It was shown in (duality2018, , Theorem 2.1) that the data defining a UGM is equivalent to that defining a TN, but with dual graphical notations that interchange the roles of nodes and edges. Converting from a UGM to an equivalent TN involves expressing each clique potential on a clique of size as a th-order tensor core , depicted as a degree- node of the TN diagram. Meanwhile, each UGM node representing a discrete RV is replaced by a copy tensor111Copy tensors were used implicitly in duality2018 , in the form of hyperedges within a defining hypergraph. of degree equal to the number of clique potentials the RV occurs in, plus one additional visible edge permitting the values of the RV to appear in the probability distribution described by the TN (Figure 2a). Since every tensor core consists of non-negative elements, the resultant TN is guaranteed to describe a non-negative higher-order tensor. We refer to this family of TN models as non-negative tensor networks.

In the dual graphical notation of TNs, marginalization of and conditioning on RVs in UGMs is achieved by contracting each visible edge of the associated TN with either a first-order copy tensor (marginalization) or an outcome-dependent basis vector (conditioning) respectively. Computing the resulting distribution over the remaining RVs is then a straightforward application of tensor contraction duality2018 , where any nodes of the TN with no remaining visible edges are merged together (Figure 2b). Furthermore, since variables are associated to copy tensors, the conditional independence property of UGMs can be proven using the copying property of copy tensors (Figure 2c). The appropriate formulation of conditional independence is slightly different in the dual TN notation, owing to the association of RVs to edges rather than nodes. In this graphical framework, conditional independence arises when a conditioning set of RVs form a cut set of the underlying TN graph, in which case the RVs and associated with the two partitions of the graph induced by this cut set will satisfy .

The reverse direction of converting non-negative TNs into UGMs is also straightforward, though care is needed with the treatment of hidden edges that aren’t connected to copy tensors. In such cases, we can replace any hidden edge with a third-order copy tensor , yielding a new visible edge which encodes a latent RV in an enlarged distribution. This enlargement process is reversible, in the sense that marginalizing over all latent RVs associated with hidden edges yields the original distribution, allowing hidden edges of a non-negative TN to be treated as visible edges without any loss of generality (Figure 2d). We will see shortly that this property is not shared by more general probabilistic TN models.

4 Born Machines

Figure 3: Overview of Born machines, a family of TN-based probabilistic models. (a) Born machines represent a general higher-order tensor as a TN, whose elements are converted to probabilities via the Born rule. This can be used to express the probability distribution itself as a composite TN diagram. (b–c) Unlike non-negative TNs, any attempt to read out the hidden edges of BMs as latent RVs alters the overall distribution, a manifestation of the “observer effect” of quantum mechanics. Converting a hidden edge to a RV and then marginalizing results in a different distribution.

While UGMs represent one means of parameterizing non-negative tensors for probabilistic modeling, an alternate approach is suggested by quantum physics. Discrete quantum systems are fully described by complex-valued wavefunctions, higher-order tensors which yield probabilities under the Born rule of quantum mechanics. The efficacy of TNs in learning quantum wavefunctions inspired the Born machine (BM) model bornmachine2018 .

Definition 2.

A Born machine consists of a tensor network over a graph containing visible edges, whose associated tensor is converted into a probability distribution via the Born rule , where denotes the 2-norm of .

The Born rule permits the (unnormalized) probability distribution associated with a BM to be expressed as a single composite TN, consisting of two copies of the TN parameterizing , one with all core tensor values complex-conjugated, and where all pairs of visible edges have been merged via third-order copy tensors (Figure 3a). Expressing the BM distribution as a single composite TN allows efficient marginal and conditional inference procedures to be applied in a manner analogous to UGMs, namely by contracting the visible edges of the composite TN with vectors (marginalization) or (conditioning), and then contracting regions of the TN with no remaining visible edges. The “doubled up” nature of the composite TN means that intermediate states occurring during inference are described by density matrices, which are positive semidefinite matrices employed in quantum mechanics whose non-negative diagonal entries correspond to (unnormalized) probabilities, and whose off-diagonal elements are referred to as “coherences.”

The existence of non-zero coherences gives BMs the ability to utilize quantum-like interference phenomena in modeling probability distributions, but also makes it difficult to interpret the operation of a BM by assigning latent RVs to its edges, as was possible with non-negative TNs. While we can force a new RV into existence by extracting the diagonal elements of intermediate density matrices using copy tensors (Figure 3b), this causes the elimination of all coherences in density matrices passing through the edge, with the result that the distribution after marginalizing the new latent variable differs from the original BM distribution (Figure 3c). This fact, which can be seen as an analogue of the measurement-induced “observer effect” in quantum mechanics, represents a tradeoff between expressivity and interpretability in BMs that isn’t available in PGMs.

5 A Hybrid Framework

Figure 4: Decoherence and the decohered Born machine (DBM) model. (a) The decoherence operator , which removes coherences from hidden states in BMs, leaving a diagonal matrix which encodes a latent RV. Decoherence operators permits the readout of latent RVs in a reversible manner, with marginalization of the latent RV yielding the original distribution (third diagram). (b) Examples of decohered Born machines (DBMs) based on a three-core TN with hidden edges and . Choosing the decohered edge set results in a decoherence operator being placed in the location corresponding to edge in the composite TN, which allows the decohered edge to be expressed as a new latent RV. (c) Sketch of the proof of Theorem 1, that every fully-decohered BM is equivalent to a UGM. Copy tensor rewriting rules permit the factorization of fully-decohered BMs into non-negative valued tensors which form clique potentials of an equivalent UGM. (d) Example of the conditional independence property for the DBM above with . Conditioning on for the latent RV at decohered edge leads to the conditioning value being copied to all attached cores, which in turn leads to a factorization of the conditional distribution into two independent pieces.

While the graphical structure of Born machines is useful for defining area laws, which characterize the attainable mutual information between subsets of RVs eisert2010colloquium ; lu2021 , they do not permit the formulation of conditional independence results, something which is a major benefit of PGMs. A primary reason for this is the inability to freely assign latent RVs to the hidden edges of BMs without disturbing the original distribution. However, by accounting for this disturbance in a principled manner, it is possible to combine the representational advantages available to BMs with the conditional independence guarantees available to PGMs.

A crucial tool is the concept of decoherence, whereby all off-diagonal coherences of a hidden density matrix are set to zero, leaving only a probability distribution on the diagonals of the operator. This can be carried out by the action of a decoherence operator, a fourth-order copy tensor acting on density matrices which we write as (Figure 4a). The operator is the natural result of converting a hidden edge of a BM into a latent RV and then marginalizing. We can use this idea to decohere certain edges of a BM in advance, leading to the notion of decohered Born machine models (Figure 4b).

Definition 3.

A decohered Born machine (DBM) consists of a Born machine over a graph along with a subset of hidden edges , referred to as the model’s decohered edges. The probability distribution represented by a DBM is given by the composite TN associated to the original BM, but with each pair of hidden edges in replaced by a decoherence tensor . A DBM for which is referred to as a fully-decohered Born machine.

Having Definition 3 in hand, we would like to first characterize the expressivity of DBMs. It is clear that standard BMs are a special case of DBMs, where the decohered edge set is taken to be empty. On the other hand, we show in the following two results that fully-decohered BMs are equivalent in expressive power to discrete UGMs.

Theorem 1.

fdbm_as_ugmThe probability distribution expressed by a fully-decohered Born machine with tensor cores , one for each node , is identical to that of a discrete undirected graphical model with clique potentials of the same shape, and whose values are given by , where contains the RVs from all edges adjacent to .

The proof of Theorem 1 is given in the supplemental material, with the basic idea illustrated in Figure 4c. Each decoherence operator can be written as the product of two third-order copy tensors, each of which can be assigned to one pair of TN cores adjacent to the decohered edge. In the case that all edges of a DBM are decohered, these copy tensors allow each pair of cores and to be replaced by their element-wise product, giving an effective clique potential with non-negative values. The UGM formed by these clique potentials has the same graphical structure as the TN describing the original BM (up to graphical duality). Conversely, the correspondence given in Theorem 1 suggests a simple method for representing any discrete UGM as a fully-decohered BM.

Corollary 1.

ugm_as_dbmThe probability distribution of any discrete undirected graphical model with clique potentials is identical to that of any fully-decohered Born machine with tensor cores of the same shape, and whose elements are given by , where can be any real-valued function, and where indicates the TN node dual to the clique .

Although standard BMs and UGMs operate very differently—and in the case of line graphs have been proven to have inequivalent expressive power glasser2019 —we see that DBMs offer a unified means of representing both families of models with an identical parameterization. We further prove in the supplemental material that DBMs are equivalent in expressivity to locally purified states, a model family generalizing both BMs and UGMs.

Another motivation for the use of DBMs is in enabling conditional independence guarantees within the setting of quantum-inspired TN models. The ability to replace any decoherence operator by a fifth-order copy tensor with a visible edge lets us assign RVs to all decohered edges, such that marginalizing over these new RVs yields the original DBM distribution (Figure 4a). These new RVs behave identically to those of a UGM, letting us demonstrate a conditional independence property.

Theorem 2.

cond_ind_dbmConsider a DBM with underlying graph and decohered edges , along with a subset which forms a cut set for . If we denote by the set of RVs associated to , and denote by and the sets of RVs associated to the two partitions of induced by the cut set , then the DBM distribution satisfies the conditional independence property .

While the complete proof of Theorem 2 is given in the supplemental material, the idea is simple (Figure 4d). The insertion of decoherence operators, which are examples of copy tensors, into the composite TN giving the DBM distribution allows any basis vector used for conditioning to be copied to all edges incident to the copy tensor. This in turn removes any direct correlations between the nodes on either side of the decohered edge, so that conditioning on a collection of RVs associated with a cut set of decohered edges results in a factorization of the conditional composite TN into a tensor product of two independent pieces.

6 Experiments

Figure 5:

DBM (red) vs. UGM (orange) models defined on a graph corresponding to a hidden Markov model (HMM). DBM exhibits a lower average negative log likelihood (NLL) on held-out data compared to an equivalent UGM on the Bars and Stripes dataset. Curves are means taken over 15 replications, shaded regions are 1 std dev. Star markers indicate minima. Left to right: increasing the number of hidden states

in HMM leads to faster learning by the DBM relative to the UGM.

Having discussed the representational

advantages of DBMs for structured discrete probability distributions, we now investigate the empirical advantages over UGMs with a similar graphical structure. We define both DBM and UGM models on an undirected version of the graph defining a hidden Markov model (HMM), and evaluate their relative performance on a sequential (flattened) version of the Bars and Stripes dataset 

mackay2003information . Bars and Stripes consists of two kinds of binary images: ones with only horizontal bars, and ones with only vertical stripes. Intuitively, we expect that the interference behavior available in DBMs can capture correlations more efficiently when the distribution being modeled exhibits regular periodic behavior. The 1D periodic patterns seen in the vertical stripe images give us an ideal setting for testing this out.

Figure 5 shows the results of optimizing a UGM and DBM to maximize marginal likelihood over observations of flattened images. Our models were implemented and trained with JAX jax2018github . The results are favorable to the DBM, which achieves both a lower average NLL on held-out data than the UGM, as well as a progressively smaller training times as the hidden dimensions of the models are increased. We conjecture that DBM models will more broadly have advantages in performance in modelling real-world data processes that exhibit long-term periodic behavior, such as geophysical processes, mechanical systems like human gait, and more general stochastic processes with some cyclical dynamics.

7 Conclusion

We use the physically-motivated notion of decoherence to define decohered Born machines (DBMs), a new family of probabilistic models that serve as a bridge between PGMs and TNs. As shown in Theorem 1 and Corollary 1, fully decohering a BM gives rise to a UGM, and conversely any subgraph of a UGM can be viewed as the decohered version of some BM. Crucial to this back-and-forth passage is the use of copy tensors, which further allows conditional independence guarantees in the context of TN modeling and provides an additional correspondence between the two modeling frameworks. An immediate limitation of our results surrounding DBMs is the focus on UGMs only. An extension to directed graphical models is left for future work, as is a deeper investigation into what kinds of problems could most benefit from utilizing quantum interference effects in the manner proposed. It is possible that DBMs would improve the performance of existing graphical model inference and learning algorithms by replacing sub-regions of the model with quantum-style ingredients, although a more systematic exploration of this question is needed. The integration of “classical” and “quantum” ingredients represented by a DBM further makes it a natural candidate for quantum machine learning, as decoherence represents a natural form of noise present in quantum hardware in the noisy intermediate-scale quantum (NISQ) era Preskill2018quantumcomputing .

The authors thank Guillaume Verdon, Antonio Martinez, and Stefan Leichenauer for helpful discussions, and Jae Hyeon Yoo for engineering support. Geoffrey Roeder is supported in part by the National Sciences and Engineering Research Council of Canada (grant no. PGSD3-518716-2018).

References

  • [1] Sandesh Adhikary, Siddarth Srinivasan, Jacob Miller, Guillaume Rabusseau, and Byron Boots. Quantum tensor networks, stochastic processes, and weighted automata. In

    International Conference on Artificial Intelligence and Statistics

    , pages 2080–2088. PMLR, 2021.
  • [2] Raphael Bailly. Quadratic weighted automata: Spectral algorithm and likelihood maximization. In Asian Conference on Machine Learning, pages 147–163. PMLR, 2011.
  • [3] Marcello Benedetti, Delfina Garcia-Pintos, Oscar Perdomo, Vicente Leyton-Ortega, Yunseong Nam, and Alejandro Perdomo-Ortiz. A generative modeling approach for benchmarking and training shallow quantum circuits. npj Quantum Information, 5(1):1–9, 2019.
  • [4] James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy programs, 2018.
  • [5] Tai-Danae Bradley, E M Stoudenmire, and John Terilla. Modeling sequences with quantum states: a look under the hood. Machine Learning: Science and Technology, 1(3):035008, 2020.
  • [6] Andrzej Cichocki, Namgil Lee, Ivan Oseledets, Anh-Huy Phan, Qibin Zhao, and Danilo P Mandic. Tensor networks for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompositions. Foundations and Trends in Machine Learning, 9(4-5):249–429, 2016.
  • [7] Bob Coecke, Dusko Pavlovic, and Jamie Vicary. A new description of orthogonal bases. Mathematical Structures in Computer Science, 23(3):555–567, 2013.
  • [8] Nadav Cohen, Or Sharir, and Amnon Shashua.

    On the expressive power of deep learning: A tensor analysis.

    In Conference on Learning Theory, pages 698–728. PMLR, 2016.
  • [9] Ralph DeMarr. Nonnegative matrices with nonnegative inverses. Proceedings of the American Mathematical Society, 35(1):307–308, 1972.
  • [10] Jens Eisert, Marcus Cramer, and Martin B Plenio. Colloquium: Area laws for the entanglement entropy. Reviews of Modern Physics, 82(1):277, 2010.
  • [11] Andrew J. Ferris and Guifre Vidal. Perfect sampling with unitary tensor networks. Phys. Rev. B, 85:165146, 2012.
  • [12] Brendan Fong and David I. Spivak. An Invitation to Applied Category Theory: Seven Sketches in Compositionality. Cambridge University Press, 2019.
  • [13] Xun Gao, Eric R Anschuetz, Sheng-Tao Wang, J Ignacio Cirac, and Mikhail D Lukin. Enhancing generative models via quantum correlations. arXiv preprint arXiv:2101.08354, 2021.
  • [14] Xun Gao, Zhengyu Zhang, and Luming Duan. An efficient quantum algorithm for generative machine learning. arXiv preprint arXiv:1711.02038, 2017.
  • [15] Ivan Glasser, Ryan Sweke, Nicola Pancotti, Jens Eisert, and J. Ignacio Cirac. Expressive power of tensor-network factorizations for probabilistic modeling. In Advances in Neural Information Processing Systems 32, pages 1496–1508, 2019.
  • [16] Zhao-Yu Han, Jun Wang, Heng Fan, Lei Wang, and Pan Zhang. Unsupervised generative modeling using matrix product states. Phys. Rev. X, 8:031012, 2018.
  • [17] Charles R Harris, K Jarrod Millman, Stéfan J van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J Smith, et al. Array programming with NumPy. Nature, 585(7825):357–362, 2020.
  • [18] J. D. Hunter. Matplotlib: A 2d graphics environment. Computing in Science & Engineering, 9(3):90–95, 2007.
  • [19] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • [20] Matthew S Leifer and David Poulin. Quantum graphical models and belief propagation. Annals of Physics, 323(8):1899–1946, 2008.
  • [21] Yoav Levine, David Yakira, Nadav Cohen, and Amnon Shashua. Deep learning and quantum entanglement: Fundamental connections with implications to network design. In International Conference on Learning Representations, 2018.
  • [22] Sirui Lu, Márton Kanász-Nagy, Ivan Kukuljan, and J Ignacio Cirac. Tensor networks and efficient descriptions of classical data. arXiv preprint arXiv:2103.06872, 2021.
  • [23] David JC MacKay. Information theory, inference and learning algorithms. Cambridge University Press, 2003.
  • [24] Jacob Miller, Guillaume Rabusseau, and John Terilla. Tensor networks for probabilistic sequence modeling. In International Conference on Artificial Intelligence and Statistics, 2021.
  • [25] Thierry Mora and William Bialek. Are biological systems poised at criticality? Journal of Statistical Physics, 144(2):268–302, 2011.
  • [26] Alexander Novikov, Dmitry Podoprikhin, Anton Osokin, and Dmitry Vetrov.

    Tensorizing neural networks.

    In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1, pages 442–450, 2015.
  • [27] Alexander Novikov, Mikhail Trofimov, and Ivan Oseledets. Exponential machines. In International Conference on Learning Representations, 2017.
  • [28] Román Oruś. A Practical Introduction to Tensor Networks: Matrix Product States and Projected Entangled Pair States. Annals Phys., 349:117–158, 2014.
  • [29] Roger Penrose. Applications of negative dimensional tensors. Combinatorial Mathematics and its Applications, 1:221–244, 1971.
  • [30] Vasily Pestun, John Terilla, and Yiannis Vlassopoulos. Language as a matrix product state. arXiv preprint arXiv:1711.01416, 2017.
  • [31] Vasily Pestun and Yiannis Vlassopoulos. Tensor network language model. arXiv preprint arXiv:1710.10248, 2017.
  • [32] John Preskill. Quantum Computing in the NISQ era and beyond. Quantum, 2:79, 2018.
  • [33] Elina Robeva and Anna Seigal. Duality of Graphical Models and Tensor Networks. Information and Inference: A Journal of the IMA, 8(2):273–288, 06 2018.
  • [34] James Stokes and John Terilla. Probabilistic modeling with matrix product states. Entropy, 21(12), 2019.
  • [35] E Miles Stoudenmire. Learning relevant features of data with multi-scale tensor networks. Quantum Science and Technology, 3(3):034003, 2018.
  • [36] EM Stoudenmire and David J Schwab. Supervised learning with tensor networks. Advances in Neural Information Processing Systems, pages 4806–4814, 2016.
  • [37] Frank Verstraete, Valentin Murg, and J Ignacio Cirac. Matrix product states, projected entangled pair states, and variational renormalization group methods for quantum spin systems. Advances in Physics, 57(2):143–224, 2008.
  • [38] Alex Wang and Kyunghyun Cho. Bert has a mouth, and it must speak: Bert as a markov random field language model. NAACL HLT 2019, page 30, 2019.
  • [39] Chaohui Wang, Nikos Komodakis, and Nikos Paragios. Markov random field modeling, inference & learning in computer vision & image understanding: A survey. Computer Vision and Image Understanding, 117(11):1610–1627, 2013.
  • [40] Ming-Jie Zhao and Herbert Jaeger. Norm-observable operator models. Neural Computation, 22(7):1927–1959, 2010.
  • [41] Wojciech Hubert Zurek. Decoherence, einselection, and the quantum origins of the classical. Reviews of Modern Physics, 75(3):715, 2003.

Appendix A Decohered Born Machines Compute Unnormalized Probability Distributions

Here we show that every decohered Born machine (DBM) defines a valid (unnormalized) probability distribution, that is, that the tensor elements of a DBM are non-negative. This can be seen from the fact that the probability distribution represented by a DBM is obtainable as a marginalization of the distribution represented by a larger Born machine (BM) model. By definition, a DBM is a BM with the property that a subset of the set of the hidden edges of are decohered. Recall that decoherence here involves the contraction of third-order copy tensors , where is the size of the set , and observe that such contraction can be achieved by marginalizing over new latent variables introduced in the corresponding hidden edges. The tensor elements of the DBM will then take the form , where is associated to a larger BM containing all copy tensors associated with decohered edges. These tensor elements are clearly non-negative, proving that DBMs always describe non-negative tensors.

Appendix B Expressivity of Decohered Born Machines

Figure 6: A tensor network (TN) associated with a graph with and . The edge set is partitioned into visible edges and hidden edges , with the four visible edges giving a fourth-order tensor . A Born machine (BM) uses two copies of this underlying TN, one with all cores complex-conjugated, to define a fourth-order composite TN whose associated tensor is an unnormalized probability distribution associated with four random variables. A decohered Born machine (DBM) uses a similar composite TN, but allows for a decoherence operator to be inserted in some edges, as specified by a set . In the composite TN shown, .

We prove several results which give a general characterization of the expressivity of decohered Born machines (DBMs), showing them to be capable of reproducing a range of classical and quantum-inspired probabilistic models. In Section B.1, we prove that fully-decohered Born machines are equivalent in expressivity to undirected graphical models (UGMs), with the equivalence in question preserving the number of parameters of the two model classes. This result, which parallels the tautological equivalence of non-decohered DBMs and standard BMs, is followed in Section B.3 by a result showing the equivalence of DBMs and locally purified states (LPS), an expressive model class introduced in [15].

We first review the definition of a DBM and some terminology for its graphical structure. Tensor networks (TNs) are defined in terms of a graph whose edges are allowed to be incident to either two or one nodes in , and we will refer to the respective disjoint sets of edges as hidden edges and visible edges . Visible edges are associated with the modes of the tensor described by the TN, with the number of visible edges in equal to the order of the tensor. Nodes which aren’t incident to any visible edges are referred to as hidden nodes of the TN. Every node is associated with a tensor core of the TN, with the order of being equal to the degree of within .

Recall that every BM is completely determined by a TN description of a higher-order tensor , whose values are then converted into probabilities via the Born rule. We call the TN describing the underlying TN, and the Born rule implies that the probability distribution can be described as a single composite TN formed from two copies of the underlying TN, with all pairs of visible edges joined by copy tensors . We sometimes use the phrase composite edge to refer to any pair of “doubled up” edges in the composite TN, in which case the composite TN can be seen as occupying the same graph as the underlying TN, but where each hidden edge corresponds to a composite edge. The probability distribution defined by a DBM is given by replacing certain composite edges in the composite TN by decoherence operators , according to whether those edges belong to a set of decohered edges (Figure 6).

b.1 Proof of Theorem 1

We first restate Theorem 1, before providing a complete proof.

Theorem 1.

fdbm_as_ugm

Figure 7: Example of Theorem 1 showing the conversion of a fully-decohered BM into an equivalent UGM. By rewriting each decoherence operator as a product of third-order copy tensors, we can rewrite every pair of BM core tensors and as a single core tensor , whose values are guaranteed to be non-negative. This can consequently be used as a clique potential for a UGM.
Proof.

We show that the composite TN defining the probability distribution of a fully-decohered BM can be rewritten as a TN on the same graph as the underlying TN, with identical bond dimensions but where all cores take non-negative values. By virtue of the equivalence of non-negative TNs and UGMs [33, Theorem 2.1], this suffices to prove Theorem 1.

Fully-decohered BMs are defined as DBMs for which , so that every composite edge within the composite TN has been replaced by a decoherence operator . Since , we can use the equality of different connected networks of copy tensors (Figure 1g) to express as a contraction of two third-order copy tensors along a single edge. Decohered edges are hidden edges and are therefore incident to two distinct (pairs of) nodes in the composite TN. This allows us to move each copy of onto a separate pair of nodes incident to the composite edge (Figure 7). We group together each pair of nodes and , along with all copy tensors incident to it, and contract each of these groups into a single tensor, which we call .

It is clear that each tensor consists of a pair of cores and with all pairs of edges joined together by separate copies of . Since this arrangement of copy tensors corresponds to the element-wise product of and , this implies that the elements of satisfy , with denoting the collection of indices associated with the edges incident to (these correspond to for some clique in the dual graph). Since each core has the same shape as , has non-negative values, and is arranged in a TN with the same graph as the underlying TN, this proves Theorem 1.

b.2 Proof of Corollary 1

Corollary 1.

ugm_as_dbm

Proof.

The statement of Corollary 1 gives an explicit formula for constructing BM cores from clique potentials , using an arbitrary real-valued tensor . It can be immediately verified that the conversion from BM cores to effective clique potentials under full decoherence (Theorem 1) recovers the same clique potentials we had started with, proving Corollary 1. Note that the values of the complex phases have no impact on the decohered cores.

b.3 Decohered Born Machines are Equivalent to Locally Purified States

Figure 8: A locally purified state (LPS) model is similar to a BM, but with an additional purification edge added to each node of the underlying TN. Although a small graphical change, this gives LPS greater expressive capabilities than BMs [15]. We show here that DBMs are equivalent to LPS in expressivity.

Although the definition of locally purified states (LPS) in [15] assumes a one-dimensional line graph for the TN, we give here a natural generalization to LPS defined on more general graphs.

Definition 4.

A locally purified state (LPS) consists of a tensor network over a graph containing visible edges, where all cores contain exactly two visible edges, one of which is designated as a purification edge, and the set of purification edges is denoted by . The -variable probability distribution defined by an LPS is given by constructing the composite TN for a BM from these cores, with order , then marginalizing over all purification edges.

An illustration of this model family is given in Figure 8. By choosing all purification edges to have dimension 1, LPS reproduce standard BMs, whereas [15, Lemma 3] gives a construction allowing LPS to reproduce probability distributions defined by general UGM. Owing to this expressiveness, and to corresponding results for uniform variants of LPS [1], we can think of LPS as representing the most general family of quantum-inspired probabilistic models. We now prove that DBMs are equivalent in expressivity to LPS, by first showing that LPS can be expressed as DBMs (Theorem 3), and then showing that DBMs can be expressed as LPS (Theorem 4).

Theorem 3.

Consider an LPS whose underlying TN uses a graph with nodes, visible edges, and hidden edges. The probability distribution represented by this LPS can be reproduced by a DBM over a graph with nodes, visible edges, and hidden edges, where the decohered edge set is in one-to-one correspondence with the purification edges of the LPS.

Proof.

Starting with a given LPS, we construct a TN matching the description in the Theorem statement, whose interpretation as a DBM will recover the desired distribution. We begin with the underlying TN for the LPS, whose nodes each have one purification edge. We connect each purification edge to a new hidden node, whose associated tensor is the first-order copy tensor with dimension equal to that of the purification edge. This converts all of the purification edges into hidden edges, which form the decohered edges of the DBM.

Given this new TN and choice of decoherence edges, the equivalence of the DBM distribution and the original LPS distribution arises from inserting decoherence operators in the composite edges connected to the new hidden nodes, and then using copy tensor rewriting rules to express the composite TN of the DBM as that of the LPS (Figure 9a). Given that the new hidden nodes are associated with constant tensors with no free parameters, and given that all of the cores defining the LPS are kept unchanged in the DBM, the overall parameter count is unchanged. This completes the proof of Theorem 3.

Figure 9: (a) Conversion from an LPS to a DBM. Using diagram rewriting rules, each purification edge joining a pair of LPS cores is expressed as a larger network of copy tensors, which allows the edge to be seen as a decoherence operator between the original pair of nodes and a new pair of dummy nodes . The result is a DBM associated with an underlying TN with twice as many nodes, and one decohered edge for every purification edge in the LPS (b) Conversion from a DBM to an LPS. In this case, we choose a function mapping the decohered edges and to nodes 2 and 4, respectively. The dotted boxes show how this can be viewed as defining new cores and as the contraction of the DBM cores and with adjacent copy tensors . The result is an LPS, where we have used dotted edges to indicate trivial purification edges of dimension 1.
Theorem 4.

Consider a DBM defined on a graph with nodes and a set of decohered edges . Given any function assigning decohered edges to nodes of incident to those edges, we can construct an LPS with nodes which represents the same probability distribution as the DBM. This LPS is defined by a TN with an identical graphical structure to the TN underlying the original DBM, but with the addition of a purification edge at each node of dimension , where is the bond dimension of edge and is the set of decohered edges mapped to node .

Proof.

Despite the somewhat complicated formulation of Theorem 4, the idea is simple. In contrast to standard BMs, DBMs and LPS both permit direct vertical edges within the composite TN defining the model’s probability distribution, and the proof consists of shifting these vertical edges from decohered edges to the nodes themselves. In the case where multiple vertical edges are moved to a single node, all of these can be merged into one single purification edge by taking the tensor product of the associated vector spaces. This gives the purification dimension appearing in the Theorem statement, with the overall procedure illustrated in Figure 9b. For nodes which are not assigned any decohered edges, a trivial purification edge of dimension is used. This completes the proof of Theorem 4.

Appendix C Proof of Theorem 2

Theorem 2.

cond_ind_dbm

Figure 10: Illustration of the conditional independence of decohered Born machines (DBMs), for a DBM over RV , , , and . is a latent RV associated with the single decohered edge of the DBM, which is a cut set for the underlying graph. Conditioning on the value of of splits the composite TN into two independent pieces, with the result being a probability distribution where and are independent RVs.
Proof.

From the definition of a cut set, the removal of from the graph for the underlying TN partitions into two disjoint pieces, and the same property holds true for the composite TN giving the DBM probability distribution. Figure 10 illustrates how conditioning on a decohered edge of a DBM results in the splitting of the associated decoherence operator into a tensor product of two rank-1 matrices, which propagate the value of the conditioning value to both pairs of incident nodes. Consequently, each composite edge whose value is conditioned on will be removed from the composite TN, and if the set of conditioning edges form a cut set for , this will result in the separation of the post-conditional composite TN into two disconnected pieces. This implies the independence of the composite random variables and in the conditional distribution, which completes our proof.

Appendix D Experimental Details

The code used to generate Figure 5 is given in the included notebook, which will install necessary libraries, retrain the models, and regenerate the figure if run in sequence. Instructions are in a README file. The saved parameters are also included. We use NumPy [17] (BSD-compatible licence) and JAX [4] (Apache 2.0 licence) for scientific computation, and Matplotlib [18] (GPL-compatible licence) for visualization. In this section, we discuss the details needed for independent implementation and verification.

Figure 5 answers the model selection problem: which model family yields better performance on held-out data? We implement a Hidden Markov Model (HMM) as both a UGM and as a DBM, and learn parameters to maximize marginal likelihood on the Bars and Stripes dataset [23].

d.1 Model Complexity: Ensuring Equal UGM and DBM Parameter Counts

To make a fair comparison between models, we must ensure that they have identical model complexity, measured here as total parameter count. On the same underlying graph, a DBM will have double the number of parameters as a UGM, because each parameter in a DBM is complex-valued and has both a real and an imaginary component. To match parameter counts, we implement a mixture of two UGMs with independent parameters and , where each consists of a collection of clique potential values. The probability of an observation , , is given by the convex combination

(3)

where denotes the probability assigned by a UGM with clique potentials functions contained in . Note that the UGM is now represented as a mixture distribution where the mixture weight

is found as a point estimate. To ensure the models are representing the same family of distributions, we also make the DBM a mixture. We do so by splitting the weights of a DBM into their magnitude and phase components, with the former specified by clique potentials in the parameter set

and the latter by phase functions in a disjoint parameter set . The magnitude components are used to compute an HMM as in the UGM case, yielding a component . The complex phase components are then used to assign complex phases to the magnitudes in , leading to a Born machine probability distribution denoted by The probability of an observation is given by

(4)

where we again make a point estimate of the mixture weight . By sharing between the two components of the DBM, we match parameter counts between the two models, with both models containing parameters in total.

d.2 Dataset, Training, and Hyperparameters

For the experiments, we generated a Bars and Stripes dataset of images, and varied the HMM hidden dimension between . Code for generating the dataset is included in the UGM and DBM training notebook, as well as a standalone notebook. To represent the images as 1D sequences, we use a horizontal raster scan from upper left to bottom right. Figure 5 reports results for with 2 and 32 having given similar results to 4 and 16. The observation sequence length was fixed to .

Training for each was replicated 15 times and trained on a different 70% train, 30% test split. The sequence of splits was fixed among different , controlled by initializing the random seed to 0 for each experiment.

Parameters for each model are: hidden-to-hidden transition probabilities , observation probabilities , and distribution over the initial hidden state , and convex mixture . For the UGM, we have two independent sets of . For the DBM, we have an additional set of complex phases . Both probabilities and phases are given an unconstrained parameterization in log space. The mixture component was parameterized as the inverse of a sigmoid, e.g., an unconstrained real number. Probabilities and mixture weights were initialized as independent standard normal random variables. The complex phase components were independently initialized as , where and .

We conducted full batch gradient descent on the training split for 30 epochs for each

and each replication, using an adaptive gradient optimizer ([19]). Experiments were run on a MacBookPro15,1 model with a single 2.6 GHz 6-Core Intel i7 and 16GB of memory. Each curve in Figure 5

is the mean over 15 replications, and the shaded areas are 1 standard deviation.

Appendix E Gauge Freedom in Probabilistic Tensor Networks

Figure 11:

(a) Action of a gauge transformation on a hidden edge of a TN. The insertion of an invertible matrix

and its inverse leads to the adjacent tensor cores and being transformed into new tensor cores and , which nonetheless describe the same overall tensor when all hidden edges are contracted together. The use of copy tensors in TNs generally restricts this gauge freedom. (b) The restriction of a TN to have cores with entirely non-negative entries forces any gauge transformation to factorize as , for a permutation and a diagonal matrix with strictly positive entries. We show how this restricted gauge freedom mostly commutes with any copy tensor inserted into the hidden edge, with copy tensor rewriting rules allowing us to express such gauge transformations as a trivial permutation of the outcomes of the latent RV associated with the hidden edge. This explains why hidden edges of non-negative TNs can be expressed as latent RVs without loss of generality, allowing a faithful representation as a UGM.

Tensor networks matching the description given in Definition 1 exhibit a form of symmetry in their parameters commonly referred to as gauge freedom. This symmetry is generated by edge-dependent gauge transformations, wherein an invertible matrix and its inverse are inserted in a hidden bond of a TN, and then applied to the two tensor cores on nodes incident to that hidden edge. This results in a change in the parameters of the two incident core tensors, which nonetheless leaves the global tensor parameterized by the TN unchanged. The phenomenon of gauge freedom ultimately arises from the close connection between TNs and multilinear algebra, where gauge transformations on a given hidden edge correspond to changes in basis in the vector space associated with the hidden edge.

The use of copy tensors in a TN leads to a preferred choice of basis, and thereby breaks the full gauge freedom of any edge incident to a copy tensor node. It is therefore surprising that for non-negative TNs, i.e. those with all core tensors taking non-negative values, hidden edges were shown to be expressable as latent RVs without loss of generality, via the insertion of copy tensors in the hidden state space (Section 3.1). The generality of this operation means that any non-negative TN can be converted into a UGM by associating hidden edges with latent RVs, where the original distribution over only visible edges is recovered by marginalizing over hidden edges. This fact is a key ingredient in the exact duality between non-negative TNs and UGMs, and differs from quantum-style probabilistic TN models. For example, attempting to observe the latent states associated to hidden edges in a BM will generally lead to a change in the distribution over visible edges, even after marginalizing out these new latent RVs.

We observe here that the generality in associating hidden edges of a non-negative TN to latent RVs is a consequence of the fact that non-negative TNs already have significantly diminished gauge freedom. More precisely, in order for a gauge transformation on a hidden edge to maintain the non-negativity of both tensor cores incident to that edge, we must generally have the change of basis matrix , as well as its inverse , possess only non-negative entries. This is a strong limitation, and is equivalent to the gauge transformation factorizing as a product , where is a permutation matrix and is a diagonal matrix with strictly positive entries [9]. We illustrate in Figure 11 how this restricted gauge freedom maintains the overall structure of the copy tensor inserted into a hidden edge, with the result being an irrelevant permutation of the discrete values of the hidden latent variable associated with that edge.

The situation is quite different for BMs and DBMs, and we remark that the use of decoherence operators in a DBM means that the gauge freedom of such models is different than for BMs. In particular, two BMs whose underlying TNs are related by gauge transformations will necessarily define identical distributions, whereas the corresponding DBMs resulting from decohering some gauge-transformed hidden edges may define different distributions. In this sense, the appropriate notion of gauge freedom for a DBM lies in between that of a BM and a UGM defined on the same graph, in a manner set by the pattern of decohered edges.

The choice of basis in which decoherence is performed can be treated as an additional parameter of the model, and we view the interaction between this basis-dependence of decoherence and basis-fixing procedures related to TN canonical forms an interesting subject for future investigation.