# Uncertainty relations and fluctuation theorems for Bayes nets

The pioneering paper [Ito and Sagawa, 2013] analyzed the non-equilibrium statistical physics of a set of multiple interacting systems, S, whose joint discrete-time evolution is specified by a Bayesian network. The major result of [Ito and Sagawa, 2013] was an integral fluctuation theorem (IFT) governing the sum of two quantities: the entropy production (EP) of an arbitrary single v in S, and the transfer entropy from v to the other systems. Here I extend the analysis in [Ito and Sagawa, 2013]. I derive several detailed fluctuation theorems (DFTs), concerning arbitrary subsets of all the systems (including the full set). I also derive several associated IFTs, concerning an arbitrary subset of the systems, thereby extending the IFT in [Ito and Sagawa, 2013]. In addition I derive "conditional" DFTs and IFTs, involving conditional probability distributions rather than (as in conventional fluctuation theorems) unconditioned distributions. I then derive thermodynamic uncertainty relations relating the total EP of the Bayes net to the set of all the precisions of probability currents within the individual systems. I end with an example of that uncertainty relation.

## Authors

• 12 publications
• ### Projection Theorems, Estimating Equations, and Power-Law Distributions

Projection theorems of divergence functionals reduce certain estimation ...
05/04/2019 ∙ by Atin Gayen, et al. ∙ 0

• ### Minimum entropy production in multipartite processes due to neighborhood constraints

It is known that the minimal total entropy production (EP) generated dur...
01/07/2020 ∙ by David H. Wolpert, et al. ∙ 0

• ### Representation theorems for extended contact algebras based on equivalence relations

The aim of this paper is to give new representation theorems for extende...
01/29/2019 ∙ by Philippe Balbiani, et al. ∙ 0

• ### Harnessing Fluctuations in Thermodynamic Computing via Time-Reversal Symmetries

We experimentally demonstrate that highly structured distributions of wo...
06/27/2019 ∙ by Gregory Wimsatt, et al. ∙ 0

• ### Uncertainty Reasoning for Probabilistic Petri Nets via Bayesian Networks

This paper exploits extended Bayesian networks for uncertainty reasoning...
09/30/2020 ∙ by Rebecca Bernemann, et al. ∙ 0

• ### A Counter Example to Theorems of Cox and Fine

Cox's well-known theorem justifying the use of probability is shown not ...
05/27/2011 ∙ by J. Y. Halpern, et al. ∙ 0

• ### What does Newcomb's paradox teach us?

In Newcomb's paradox you choose to receive either the contents of a part...
03/06/2010 ∙ by David H. Wolpert, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

There has been a lot of research in non-equilibrium statistical physics and stochastic thermodynamics that considers a single non-equilibrium system executing a specified discrete time evolution. Examples include analyses of a system undergoing bit erasure parrondo2015thermodynamics ; sagawa2014thermodynamic a system maintaining a non-equilibrium steady state seifert2012stochastic , or more generally a system undergoing an arbitrary discrete-time dynamics maroney2009generalizing ; wolpert_arxiv_beyond_bit_erasure_2015 ; owen_number_2018 . There has also been some research on the thermodynamics of a pair of interacting systems horowitz2014thermodynamics , in some cases where the first system measures the second one horowitz2011designing ; sagawa2008second , or performs a sequence of measurements and manipulations of the second one mandal2012work ; barato2014stochastic ; strasberg2017quantum . In particular, there has been research on fluctuation theorems for systems under the feedback control of another system sagawa_ueda_PRL_2012 ; horowitz_vaikuntanathan_PRE_2010 .

ito2013information extended this line of research to analyze the non-equilibrium statistical physics of an arbitrary number of interacting systems whose joint discrete-time evolution is specified by a Bayesian network (BN koller2009probabilistic ; neapolitan2004learning ). The major result of that paper was an integral fluctuation theorem (IFT) governing the sum of the entropy production (EP) of an arbitrary single one of the systems, and the transfer entropy from to the rest of the systems.

In this paper I also consider the thermodynamics of (systems that implement) BNs. I derive detailed fluctuation theorems (DFTs) for the entire trajectory of all the systems evolving according to the BN (rather than for just a single system). One of these DFTs gives the ratio of the probability of a specified joint trajectory of all the systems under a forward protocol to the probability of the reverse trajectory under the reverse protocol. Another DFT gives the ratio of the probability of a specified vector of the entropy productions (EPs) of all the systems under a forward protocol to the probability of the negative of that vector under the reverse protocol. I also derive “conditional DFTs”. These relate the probability under the forward protocol of a specified vector of the EPs of some of the systems, conditioned on a specified vector of the EPs of the remaining systems, to the conditional probability of the negative of those vectors, under the reverse protocol. I also derive IFTs to go with all of these DFTs.

After analyzing the fluctuation theorems of BNs I derive thermodynamic uncertainty relations that (in some scenarios) relate the total EP incurred by running a BN to the precisions of currents defined separately for each of the systems in that BN. I then present a toy example to illustrate this uncertainty relation. I end by discussing extensions and directions for future research.

As a notational point, I write the Kronecker delta as , and the Dirac delta as . I assume the reader knows basic terminology concerning Bayesian networks, e.g., directed acyclic graphs (DAGs), their roots, leaves, children, parents, etc., ito2013information ; koller2009probabilistic . I indicate entropy of a distribution as , or just for short cover_elements_2012 . I write the mutual information of a distribution as , or just

for short. More generally, I write the multi-information of a joint distribution over a set of random variables,

by

 I(p) :=[∑iS(p(Xi))]−S(p(X)) (1)

Mutual information is the special case of multi-information where there are exactly two random variables. More generally, multi-information is a sum of mutual informations:

 I(p) =[S(p(X1))+S(p(X>1))−S(p(X))] =I(X1;X>1)+I(X2;X>2)+… (2)

where I define for all .

I will use “path” and “trajectory” interchangeably, to mean a function from time into a state space. In the usual way, I use the argument list of a probability distribution to indicate what random variables have been marginalized, e.g., . In addition, I write for the cardinality of any set .

Although many of the results below are more general, to fix thinking the reader can presume that the entire set of interacting systems evolves according to a continuous-time Markov chain (CTMC), while in contact with a single heat bath with constant temperature

, where I choose units so that . I will sometimes use “(forward) protocol”, or “process”, to refer to a sequence of Hamiltonians and rate matrices in such a CTMC.

## Ii Stochastic thermodynamics of Multi-system Bayesian networks

The systems considered in ito2013information

are extensions of Bayesian networks, with the added structure that subsets of the nodes in the Bayes net’s DAG are identified as states of the same physical system, evaluated at different moments in time. In order to analyze the joint thermodynamics of all the variables in such a system, we need to add yet more mathematical structure

111 ito2013information considered the entropy production generated by running only one of the systems, ignoring the EP generated by the remaining systems. This allowed them to avoid specifying this extra mathematical structure..

In the next subsection I formalize this extension of Bayesian networks, which are called “multi-system Bayesian networks” (MBNs). Then in the following two subsections I introduce two special types of physical process, which can be used to analyze the thermodynamics incurred by implementing the conditional distribution at any specific node in an MBN. I also calculate the EP generated by running those processes. In the last subsection I use these results to calculate the total EP generated by running an MBN.

### ii.1 Multi-system Bayesian networks

Suppose we have a finite set of physical systems that evolve together, with a set of associated finite state spaces . Write the joint state space of all the systems as , with elements written as . We also have a DAG with a set of nodes , with non-root nodes , and root nodes . We also have a function that maps each to one of the systems. I write to indicate the joint distribution over the root nodes of the DAG. For convenience, I put no a priori restrictions on this joint distribution 222In contrast, conventional BNs require that the distribution over the root nodes be a product of a set of “prior” distributions, one for each of those nodes..

In addition we have a set of conditional distributions, , each of which specifies how the system corresponding to some non-root node evolves when it is “run”, given the values of the (states of the systems corresponding to) its parent nodes. To make this more precise, for each node , write for the ancestor nodes of that are not root nodes. Then I will refer to the distribution over “after is run” as shorthand for

 ∑xg(R)p0(xg(R))⎡⎢ ⎢⎣∑x′g(Anc(v))∏v′∈Anc(v)πv′(xg(v′)|xpa(g(v′)))⎤⎥ ⎥⎦πv(xg(v′)|xpa(g(v′))) (3)

For this description of the dynamics over to be both complete and self-consistent, there must be exactly one root node that corresponds to each subsystem, i.e., for all subsystems , for exactly one . In addition, every non-root node in the DAG must represent an update of the state of one of the physical systems. Formally, this means that for each non-root node , there is one (and only one) parent of in the DAG, , such that ; we interpret as the initial state of at the beginning of a process updating it, while is the state when that update has completed.

Except for relaxing the requirement that the distribution over the root nodes be a product distribution, this mathematical structure is the graphical model (implicitly) assumed in ito2013information . I refer to this structure as a multi-system Bayesian network (MBN). MBNs are similar to several graphical models in the literature, including non-stationary dynamic Bayesian networks robinson2009non , time-varying dynamic Bayesian networks song2009time , and non-homogeneous dynamic Bayesian networks dondelinger2013non , among others 333ito2013information informally refer to the full structure of an MBN as a “Bayesian network”, even though it has additional structure not specified in Bayesian networks. They also sometimes refer to such a structure as a “causal network”, a term which already has precise meaning in the literature which is different from MBNs. . An important property that MBNs inherit from BNs is that any MBN can be implemented by executing the conditional distributions at the nodes of the associated DAG, in a sequence specified by a topological order of the underlying DAG. (This fact was central to the analysis in ito2013information .)

Below I will often simply write rather than when it is understood that is a node, with the elements of written as . I will also write the set of all trajectories through (i.e., the set of all maps from time into ) as X, with elements written as . Given any trajectory , the associated trajectory of states of subsystem is written as , and its value at time is . (So for example, given any node in the MBN, the trajectory of states of subsystem is written as .) I will also sometimes use phrases like “the process updating node ” as shorthand for “a stochastic thermodynamic process whose effect is to update the state of based on the values of the subsystems according to the conditional distribution ”.

To avoid confusion, now on I will most often refer to any one of the elements in as a “subsystem”, with the full set being the “full” or “joint” system. Accordingly, I will use the term subsystem variable to refer to the state of a subsystem as it evolves over time. I will write for the marginal distribution for the subsystem variable at time . I will sometimes talk about “running” a node, or “updating” it, as shorthand for implementing the process that changes according to the conditional distribution .

### ii.2 Path-wise subsystem processes

In order to analyze the thermodynamics of entire MBNs, we first need to understand the thermodynamics of systems whose subsystems evolve independently of one another. (For the moment, the discussion will not be limited to MBNs per se, but apply to any such set of co-evolving subsystems.)

Define a (path-wise) subsystem process as any process governing evolution over the joint state space during time interval where:

1. The subsystems evolve independently of one another, i.e., the discrete-time conditional distribution over the joint state space is

 π(xt1|xt0)=∏i∈Sπi(xt1i|xt0i) (4)
2. There are functions, , such that the entropy flow (EF) into the joint system during the process if the full system follows trajectory and the initial joint distribution is can be written as

 Q(x,pt0)=∑i∈SQi(xi,pt0i) (5)

for all trajectories that have nonzero probability under the protocol for initial distribution .

Intuitively, in a subsystem process the separate subsystems evolve in complete isolation from one another, with decoupled Hamiltonians and rate matrices. (See wolpert2018exact ; wolpert_thermo_comp_review_2019 for explicit examples of CTMCs that implement subsystem processes, for the special case where there are two subsystems.)

As is conventional seifert2012stochastic ; van2015ensemble , write the (path-wise, global) EP incurred if the system follows trajectory as

 σ(x,p) :=(ln[pt0(xt0)]−ln[pt1(xt1)])−Q(x,pt0) (6)

Also define the (path-wise) subsystem EP for subsystem as

 σi(xi,pi) :=(ln[pt0i(xt0i)]−ln[pt1i(xt1i)])−Qi(xi,pt0i) (7)

I use the term (path-wise, subsystem) Landauer loss to refer to the extra EP generated by implementing the protocol due to the fact that we do so with a subsystem process:

 L(x,p) :=σ(x,p)−N∑i=1σi(xi,pi) =Ipt0(xt0)−Ipt1(xt1) :=−ΔIp(x) (8)

where the second line uses condition (2) of path-wise solitary processes to cancel the EFs.

(Expected) subsystem EP is always non-negative. Therefore if the expected multi-information among the subsystems decreases in a subsystem process, the Landauer loss must be strictly positive — and so the joint system EP must be strictly positive. This is true no matter how thermodynamically efficiently the individual subsystems evolve. One way to understand this intuitively is to note that in general the Shannon information stored in the initial statistical coupling among the subsystems will diminish (and maybe disappear entirely) as the process runs. So the contribution to the joint entropy from the statistical coupling among the subsystems grows as the joint system evolves. However, for each subsystem the rate matrix governing how evolves cannot depend on the states of the rest of the subsystems, , due to condition (2) of subsystem processes. So that rate matrix cannot exploit the information in the statistical coupling between the initial states of the subsystems to reduce the amount of entropy that is produced as that information dissipates. (See wolpert2018exact ; wolpert_thermo_comp_review_2019 .)

### ii.3 Path-wise solitary processes

I will use the term semi-fixed process to refer to any process involving two subsystems where subsystem never changes its state. Sometimes I will refer to subsystems and of a semi-fixed process as the evolving and fixed subsystems, respectively. Some of the first investigations of the thermodynamics of semi-fixed processes were sagawa2010generalized ; horowitz2011thermodynamic ; sagawa_ueda_PRL_2012 .

An important special type of a semi-fixed process is one which is also a subsystem process, and so obeys conditions (1) and (2) above. I refer to this kind of semi-fixed process as a (path-wise) solitary process. The system-wide EP in a solitary process is

 σ(x,p)=σ1(x1,p1)−ΔI(x1;x2) (9)

Due to the data-processing inequality cover_elements_2012 , in a solitary process the expected drop in mutual information, , is non-positive.

The analysis in ito2013information was based on the fact that Eq. 9 holds for an arbitrary semi-fixed process, not just solitary processes, so long as one replaces in that equation with

 ¯¯¯σ1(x,p1) :=(ln[pt0i(xt0i)]−ln[pt1(xt1i)])−Q(x,pt0) (10)

Intuitively, this replacement amounts to redefining the “entropy production” of subsystem to involve the EF generated by running the entire system, not just the EF due to running subsystem . (Note that depends on all of , including the trajectory of the fixed subsystem, whereas depends only on the trajectory of the evolving subsystem, .)

Eq. 10 means that semi-fixed processes in general incur (a version of) Landauer loss, just like solitary processes. Nonetheless, there are important differences between semi-fixed processes and solitary processes. Consider a process involving three subsystems, , and . Only subsystem changes its state in this process, and the dynamics of subsystem depends on the state of subsystem , but not on the state of subsystem . We can formulate this process as a solitary process by identifying the joint system as the evolving subsystem, and identifying the subsystem as the fixed subsystem. Note that the expectation of the associated subsystem EP is non-negative, i.e., .

However, in general we cannot instead identify subsystem as the evolving subsystem of a solitary process, with the joint subsystem being the fixed subsystem. (The reason is that because the evolution of subsystem depends on the state of subsystem , we cannot express the total EF as a function just of the starting value of subsystem , as required by condition (2).) On the other hand, we are free to identify subsystem as the evolving subsystem of a semi-fixed process, with the joint subsystem as the associated fixed subsystem. Crucially though, if we do this then the expectation of the associated subsystem EP can be negative, i.e., it may be that .

To give physical meaning to these considerations, suppose we are interested in the minimal amount of work it would take to return the joint system from its ending distribution to its starting one. This minimal amount of extra work (sometimes called the “dissipated” work) is given by the subsystem EP if the dynamics is a solitary process. However, it does not equal the subsystem EP in semi-fixed processes, in general. Similarly, the expected EP of the evolving subsystem in a solitary process bounds the precision of any current defined over the state of the subsystem, in the usual way given by thermodynamic uncertainty relations falasco2019unifying . However, the expected EP of the evolving subsystem in a semi-fixed process does not have so simple a relationship with the current in that subsystem in general.

Since subsystem EP has these physical meanings in solitary processes but not in semi-fixed processes, I mostly focus on solitary processes in the analysis below.

### ii.4 Entropy production in Multi-system Bayesian networks

As mentioned above, ito2013information assumes that any MBN under consideration represents a physical system that runs a discrete-time Markov chain over that implements the conditional distributions of those nodes one at a time, in a sequence specified by a topological order of the DAG of the MBN. (Without loss of generality we can assume that all root nodes occur first in the topological order, and I make that assumption from now on.) This means that whenever some node in the MBN is being updated, none of the other nodes in the MBN are allowed to change their state. In addition, ito2013information implicitly assumes that the conditional distribution for any node with parents pa is implemented by a rate matrix that only couples the variables corresponding to the nodes So using the terminology introduced above, ito2013information formulates the physical process implementing the MBN as a sequence of semi-fixed processes, one for each node in the MBN, where the evolving subsystem is when node is run.

Here I modify this model by allowing the subsystems to have their initial states set in parallel, by sampling the joint distribution over the root nodes, rather than requiring that those nodes be sampled independently, one after the other 444Ultimately, this is just a modeling choice. An alternative would be to model the same system using a different DAG, which included an extra node that was a shared parent of what in my model is the set of root nodes.. Moreover, for the reasons given above, rather than formulate the dynamics of the non-root nodes as a sequence of semi-fixed processes, I formulate it as a sequence of solitary processes, where the evolving subsystem when node is run is the joint system .

Index the nodes by their (integer-valued) position in the topological order, . For any non-root node , write the distribution over all systems after node has run as  555Note that must have been sampled before such a node is run, and that by definition of MBN, this means that every subsystem has a well-defined state by the time node has run.. Assume that the process sampling transpires in the time interval , and that each remaining node is run during the time interval . Write to indicate the trajectory of joint states of the subsystems starting at time , after the root nodes have been jointly sampled. Introduce the shorthand that for any , is that segment of corresponding to the time interval when node is run. Given that in each of the successive solitary processes I identify the evolving subsystem as the combination of a node and its parents, it will be useful to introduce the shorthand that for any , is the full trajectory of the components of specified by , i.e., . (So for example, .) Similarly, define .

Since EP is cumulative over time, by repeated application of Eq. 9, once for each node in the MBN, we see that the global EP incurred by running all nodes in the MBN if the joint system follows trajectory is

 σ(x,π,p0) =|V|∑v=1σv(xv(v),πv,pv−1)+(Ipv−1(xv(v);xv−(v))−Ipv(xv(v);xv−(v))) :=|V|∑v=1(σv(xv(v),πv,pv−1)−ΔIπv,pv−1(xv)) (11)

Eq. 11 is the starting point for many of the results in this paper. I will sometimes shorten it to

 σ(x) =|V|∑v=1σv(xv)−ΔIv(x) (12)

leaving the distributions and implicit.

As a final, technical note, in general it is not possible to implement an arbitrary conditional distribution with a CTMC, without introducing some “hidden” dynamics. However, this does not actually affect the applicability of the results in this paper; see Appendix A.

## Iii Fluctuation theorems for multi-system Bayesian networks

Let indicate the time-reversal of the trajectory . (For simplicity, I restrict attention to spaces whose elements are invariant under time-reversal.) Let indicate the probability (density) of under the forward protocol running the entire MBN. Let indicate the probability of the same trajectory if we run the protocol in time-reversed order. So the ending distribution over under is the same as the starting distribution under . Also write to indicate the time-reversal of the trajectory segment .

In the next subsection, I derive fluctuation theorems concerning probabilities of trajectories, and in the following subsection, I derive fluctuation theorems concerning the joint probability that each of the subsystem EPs has some associated specified value.

### iii.1 Fluctuation theorems for trajectories

Plugging Eq. 12 into the usual detailed fluctuation theorem (DFT) van2015ensemble gives the novel DFT,

 ln[P(x)~P(~x)] =|V|∑v=1(σv(xv)−ΔIv(x)) (13)

for all with nonzero probability under . Exponentiating both sides of Eq. 13 and then integrating results in a novel IFT:

 ⟨e−∑|V|v=1σv−ΔIv⟩ :=∫dxP(x)e(−∑|V|v=1σv(xv)−ΔIv(x)) =1 (14)

In addition to applying to the running of the entire MBN, the usual DFT applies separately to successive time interval, i.e., to each successive interval at which exactly one node and its parents co-evolve as that node’s conditional distribution is executed. Therefore for all ,

 ln[P(xv)~P(~xv)] =σv(xv)−ΔIv(x) (15)

which results in an IFT analogous to Eq. 14.

We can also use Eq. 12 to establish fluctuation theorems based on summing up EP over precisely those intervals during which some subsystem evolves. For example, the DFT governing the composite dynamics of subsystem over all intervals in which it evolves is

 ln[P(xg−1(i))~P(~xg−1(i))] =∑v∈g−1(i)(σv(xv)−ΔIv(x)) (16)

which results in the IFT

 ⟨e−∑v∈g−1(i)σv−ΔIv⟩ =1 (17)

Note that by combining Eqs. 15 and 13 we get

 I(P(x)) =I(~P(~x)) (18)

where I define

 I(P(x)) :=ln⎡⎣P(x)∏|V|v=1P(xv)⎤⎦ (19)

and similarly for . Intuitively, is an extension of multi-information to concern probabilities of entire trajectories of the joint system. Note that Eq. 19 can be rewritten as

 P(x)~P(~x) =∏vP(xv)~P(~xv) (20)

which can be derived directly, without invoking DFTs 666To see this, expand where means the slice of at time and expand similarly. Then simply note that by construction, for all . .

We can also combine Eqs. 15 and 13 to derive DFTs and IFTs involving conditional probabilities, in which the trajectories of one or more of the subsystems are fixed (and arbitrary). To illustrate this, pick any , and plug Eq. 15 into Eq. 13 for all subsystems . Define , i.e., the the “partial trajectory” given by all segments of the trajectory . Then after clearing terms we get the following “conditional DFT”, which must hold for all partial trajectories with nonzero probability under :

 ln[P(x|xV′)~P(~x|~xV′)] =ln[P(xV∖V′|xV′)~P(~xV∖V′|~xV′)] =−I(P(xV′))+∑v∈V∖V′(σv(xv)−ΔIv(x)) (21)

(Note that the multi-information term on the RHS of Eq. 21 concerns only those densities for .) In turn, Eq. 21 gives the following “conditional IFT” which must hold for each partial trajectory with nonzero probability under :

 ⟨e(I+∑v∈V∖(V′)(ΔIv−σv))⟩P(.|xV′) =1 (22)

As an example of these conditional fluctuation theorems, for the case of a single node in , , the multi-information term in Eq. 21 disappears and we get

 ln[P(x|xv)~P(~x|~xv)] =ln[P(x|xv)~P(~x|~xv)] =∑v′≠v(σv′(xv′)−ΔIv′(x)) (23)

while Eq. 22 becomes

 ⟨e∑v′≠v(ΔIv′−σv′)⟩P(.|xv) =1 (24)

Note that in addition to these results which hold when considering the entire system , since each subsystem process is a solitary process, the usual DFT and IFT must hold for each subsystem considered in isolation, in the interval during which it is updated. So for example,

 ln[P(xv(v))~P(~xv(v))] =σv(xv) (25)

(Compare to Eq. 15.) Eq. 25 gives us an additional set of conditional DFTs and IFTs. For example, it gives the following variant of Eq. 23

 ln[P(x|xv(v))~P(~x|~xv(v))] =−ΔIv(x)+∑v′≠v(σv′(xv′)−ΔIv′(x)) (26)

The numerator of the expression inside the logarithm on the LHS of Eq. 26 is a distribution conditioned on the joint trajectory of (the subsystem corresponding to) node and its parents when node runs. In contrast, the numerator inside the logarithm on the LHS of Eq. 23 is a distribution conditioned on the joint trajectory of all of the subsystems when node runs (not just the joint trajectory of and its parents).

###### Example 1.

Suppose we have a solitary process over a time interval with evolving subsystem and semi-fixed subsystem , which are correlated when the process begins. We can represent this as an MBN with a 4-node DAG, where the evolving subsystem is represented by root node 1 feeding into leaf node 3, and the fixed subsystem is represented by root node 2 feeding into leaf node 4. Write the starting time for the process as is and the ending time as .

In the special case of solitary processes, there is no EP generated by subsystem B. So Eq. 23 gives the DFT,

 ln[P(xB|xA)~P(~xB|~xA)] =It0(xA;xB)−It1(xA;xB) (27)

which must hold for all with nonzero probability. (Note that the terms on the LHS of this equation involve entire trajectories, whereas those on the RHS involve only starting and ending states.) Similarly, Eq. 24 gives the IFT,

 ∫dxBP(xB|xA)eIt0(xA;xB)−It1(xA;xB) =1 (28)

which must hold for all with nonzero probability.

Eq. 13 through Eq. 24 all hold for general semi-fixed processes, not just solitary processes, if we replace throughout with , and also replace throughout with

 ¯¯¯¯¯¯¯ΔIv(x) :=Ipv−1(xvv;xv−v)−Ipv(xvv;xv−v) (29)

(Indeed, ito_information_2015 is an investigation of the variant of Eq. 17 that applies to semi-fixed processes, where we make these replacements.) However, Eqs. 26 and 25 along with the associated IFTs need not hold for general semi-fixed processes.

### iii.2 Fluctuation theorems for EP

We can use the DFTs of the previous subsectio,n which concern probabilities of trajectories, to construct “joint DFTs”, which instead concern probabilities of vectors of the joint amounts of EP generated by all of the subsystems (see Sec. 6 in van2015ensemble ).

To begin, define . Similarly define

 ~σv(~x) :=ln⎡⎣~P(~xv(v))P(xv(v))⎤⎦ (30)

In the special case that , we can rewrite this as , the EP generated by running (the part of the protocol that implements) the conditional distribution at node backwards in time, starting from the distribution over that is the ending distribution when node is implemented going forward in time. We cannot rewrite it that way in general though; see discussion of Eq. 85 in van2015ensemble .

Using this notation, for any set of real numbers ,

 P(x:{σv(x(v))=αv,ΔIv(x)=γv:v∈V}) =∫dxP(x)∏vδ(σv(xv(v))−αv)δ(ΔIv(x)−γv) =e∑vαv−γv∫dx~P(~x)∏vδ(σv(xv(v))−αv)δ(ΔIv(x)−γv) =e∑vαv−γv∫dx~P(~x)∏vδ(ln[P(xv(v))~P(~xv(v))]−αv)δ(ΔIv(x)−γv) (31) =e∑vαv−γv~P({~σv(~x(v))=−αv,~ΔIv(~x)=−γv:v∈V}) (32)

We can write Eq. 32 more succinctly as

 ln⎡⎢ ⎢⎣P({σv=αv,ΔIv=γv:v∈V})~P({~σv=−αv,~ΔIv=−γv:v∈V})⎤⎥ ⎥⎦ =∑v′αv′−γv′ (33)

or just

 ln⎡⎢ ⎢⎣P({σv,ΔIv})~P({−~σv,−~ΔIv})⎤⎥ ⎥⎦ =∑v′(σv′−ΔIv′) (34)

for short.

In addition to Eq. 34, which concerns the entire MBN, the conventional extension of the DFT must hold separately for the time interval when each evolving subsystem runs:

 ln⎡⎢ ⎢⎣P(σv,ΔIv)~P(−~σv,−~ΔIv)⎤⎥ ⎥⎦ =σv−ΔIv (35)

Combining Eqs. 35 and 34 establishes that

 =∏vP(σv,ΔIv)~P(−~σv,−~ΔIv) (36)

(Note that it is not true that in general.) This should be compared to Eq. 18.

Combining Eqs. 35 and 34 also gives a set of conditional fluctuation theorems, analogous to Eqs. 22 and 21, only conditioning on values of EP and drops in mutual information rather than on components of a trajectory. For example, subtracting Eq. 35 from Eq. 34 gives gives the conditional DFT,

 =∑v′≠v(σv′−ΔIv′) (37)

which must hold for all pairs that have nonzero probability under . This in turn gives the conditional IFT,

 ⟨e∑v′≠v(ΔIv′−σv′)⟩P(.|σv,ΔIv) =1 (38)

which must hold for all with nonzero probability under .

As usual, since each subsystem process is a solitary process the usual DFTs and IFTs must hold for each subsystem considered in isolation, in the interval during which it is updated. So for example,

 ln[P(σv)~P(−~σv)] =σv (39)

(Compare to Eq. 35.) Combining Eqs. 39 and 34 gives us an additional set of DFTs and IFTs. For example, it gives the following variant of Eq. 37:

 ln⎡⎢ ⎢⎣P({σ−v,ΔI−v}|σv)~P({−~σ−v,−~ΔI−v}|−~σv)⎤⎥ ⎥⎦ =−ΔIv+∑v′≠v(σv′−ΔIv′) (40)
###### Example 2.

Consider an arbitrary subsystem process with simultaneously evolving subsystems. Suppose that each subsystem starts at a Boltzmann distribution for its own Hamiltonian, which is a function of only its own state. Suppose though that the joint initial distribution is not a product of Boltzmann distributions.

Under these circumstances, the EP of each subsystem equals the dissipated work done on that subsystem van2015stochastic ; seifert2012stochastic . Accordingly, for each subsystem , define to be the change in its (equilibrium) free energy during the process, to be the work done on it during that process, and to be the change in the mutual information between its state and that of the other subsystems during the process. Then Eq. 34 gives a “multi-system” variant of Crooks’ fluctuation theorem crooks1998nonequilibrium ; crooks1999entropy :

 P({wi,ΔIi})P({−wi,−~ΔIi}) =e∑Ni=1(wi−ΔFi−ΔIi) (41)

which results in a “multi-system” variant of the Jarzynski equality jarzynski1997nonequilibrium ; sagawa2010generalized ,

 ⟨e∑Ni=1(wi−ΔIi)⟩ =e−∑Ni=1ΔFi (42)

In particular, in the special case where , the mutual information is the same for all subsystems and we get

 P(ΔIi,{wi})P(−~ΔI,{−wi}) =e−2ΔI+∑2i=1(wi−ΔFi) (43) ⟨e−2ΔI+∑2i=1wi⟩ =e−∑Ni=1ΔFi (44)

Eq. 34 through Eq. 38 also hold for semi-fixed processes in general, if as usual throughout those equations we replace with and replace with . However, Eqs. 40 and 39 along with the associated IFTs need not hold for semi-fixed processes in general.

## Iv Thermodynamic uncertainty relations for multi-system Bayesian networks

In this section I show how to combine Eq. 12 with some recently derived thermodynamic uncertainty relations to derive a bound on the precision of time-integrated currents in MBNs. I then present a simple example of this bound.

### iv.1 The generalized uncertainty relation and MBNs

falasco2019unifying analyzes the relationship between an arbitrary stochastic process generating trajectories over some finite space (even a non-Markovian process) and an arbitrary associated time-antisymmetric real-valued function , i.e., a function obeying . They derive the tight inequality,

 ⟨f⟩2⟨f2⟩ ≤⟨tanh ^σ[P,y]2⟩ (45)

where for any real-valued function ,

 ⟨g⟩ :=∫dyP(y)g(y) (46) ^σ[P,y] :=ln[P(y)P(~y)] (47)

(Note that the denominator of the logarithm in Eq. 47 is , not .) They then show that Eq. 45 implies the (weaker) lower bound on the precision of under process ,

 ⟨f⟩2Varf :=⟨f⟩2⟨f2⟩−⟨f⟩2 ≤e^σ[P]−12 (48)

where . Eq. 48 means that we cannot increase the precision of a current beyond a certain point without “paying for it” by increasing . Alternatively, it means that if we can experimentally measure the precision of a current, then we can lower-bound the sum of all contributions to that are not directly experimentally measurable.

As an example of Eq. 48, since any linear combination of currents is a current, setting for a set of currents implies

 (∑i⟨fi⟩)2∑i,jCov(fi,fj) ≤e^σ[P]−12 (49)

In particular, suppose we define a set of currents which are each (time-anti-symmetric) functions of . Then we get

 σ[P] ≥ln[2(∑v⟨fv⟩)2∑v,v′Cov(fv,fv′)+1] (50)

This illustrates that for a fixed value of , there is a tradeoff among the precisions of all of the currents, , and the correlations among different currents.

As falasco2019unifying emphasizes, while the inequalities Eqs. 48 and 45 always hold, only has thermodynamic meaning when certain conditions are met. In particular, suppose that the process is a CTMC evolving over time interval , LDB holds, and the starting and ending distributions over are identical. Suppose as well that the driving protocol is time-symmetric, i.e., both the trajectories of Hamiltonians and the trajectory of rate matrices are invariant if we replace all times with . Under such circumstances, , and so can be identified with the dissipated work done on the system. This special case of Eq. 48 is known as the “generalized thermodynamic uncertainty relation” (GTUR), and was first derived in hasegawa2019generalized .

In light of these results consider an MBN formulated as a sequence of solitary processes, where the protocol of each separate solitary process is time-symmetric about the middle of the interval that it takes place. Assume as well that the beginning marginal distribution of every subsystem when it begins to run is the same as the ending marginal distribution of that subsystem after it finishes running 777Note that this assumption does not imply that the distribution over the states of of each subsystem at the beginning of the running of the entire MBN is the same as its distribution at the end of running the entire MBN. Specifically, the states of the variables can change at times outside of the interval when gets updated.. So for each subsystem , using obvious notation, . Therefore by using first Eq. 12 and then Eq. 48, rather than Eq. 50 we get

 σ[P] =∑v(^σv[Pv]−ΔIv(P)) ≥∑v(ln[2⟨fv⟩2Var(fv)+1]−ΔIv(P)) (51)

where the random variable is any time-asymmetric function of .

Eq. 51 illustrates a trade-off among the precisions of currents of the various subsystems, the sum of the drops in mutual information, and the total dissipated work of the joint system. In particular, it suggests that without changing the conditional distributions at the nodes of the MBN, and without incurring any addition global EP, if we can change the initial distribution in a way that reduces the mutual information changes, , then it is possible to increase the precisions of the currents in the subsystems.

This suggestion must be treated with care though. The rate matrix of a solitary process evolving only couples the subsystems specified in . Therefore the precision of any current defined in terms of is fully specified by the combination of that rate matrix and the initial distribution . So as long as we leave both of those quantities alone, changing the statistical coupling between and at the start of that solitary process cannot affect the precision of that current.

On the other hand, suppose we did not restrict ourselves to using a solitary process to evolve , allowing ourselves to instead use a semi-fixed process to implement the same conditional distribution over during the interval . In this case the rate matrix for the dynamics of would be allowed to involve values of both and . So in theory at least, the rate matrix could be designed to exploit that coupling to reduce the system-wide EP generated in implementing – or alternatively, keeping the EP unchanged, the rate matrix could be designed to exploit that coupling to increase the precision of .

### iv.2 Example of the uncertainty relation for MBNs

Suppose we have three subsystems, and , and that their joint evolution is given by a sequence of two solitary processes. In the first solitary process the evolving subsystem is the composite system , where is the fixed subsystem. In the second solitary process the evolving subsystem is the composite system , where is the fixed subsystem. Presume as well that subsystem does not change its state from the beginning to the end of the full process. The associated MBN has six nodes, representing the initial and final states of the three subsystems. Three of those nodes are root nodes, and the remaining three are leaf nodes. Assume that both of the solitary processes take place during a time interval of length .

Take to be binary. In addition assume that both and have at least three elements. (Recall that a system with two states cannot have a non-equilibrium steady state (NESS.) Then we can (for example) take to be the net number of jumps from some specific state to some specific other state , and similarly for . (Since does not change in the dynamics, is irrelevant.)

Assume as well that while each solitary process runs, the rate matrix governing the system dynamics is time-homogeneous, i.e., that the matrix discontinuously changes when the first solitary process ends and the second begins, but other than that it never changes. Since does not change, we can write the rate matrix governing the evolution of during the first solitary process as , or just the matrix for short. We will be interested in the case where both matrices have a (unique) NESS over , but that the NESS differs for the two values.

Using this notation, we can write the the conditional distribution for how the first solitary process transforms the state as . This conditional distribution differs for the two values of