is concerned with the thermodynamic properties of arbitrarily far off-equilibrium systems that evolve according to a continuous-time Markov chain (CTMC). A central result in this field is an expression for the time derivative of the Shannon entropy of an evolving system as the sum of two terms. The first term is the “entropy flow” (EF) rate, capturing the transfer of entropy between the system and external reservoirs that it is coupled with. The second is the non-negative “entropy production” (EP) rate, capturing the net increase of entropy in the combination of the system and the external reservoirs.
Suppose we are given the initial distribution of states of a system, along with a desired discrete-time dynamics of the system. The minimal time-integrated EF that is generated by any CTMC that implements that dynamics is known as the “generalized Landauer’s bound” parrondo2015thermodynamics ; sagawa2014thermodynamic . It can only be achieved if the EP rate is exactly zero throughout the system’s evolution. A canonical example is when the system has two states, the initial distribution is uniform, and the desired discrete-time dynamics is the bit-erasure map, so that the generalized Landauer’s bound is just .
In general, if we are also provided with some constraints on the rate matrix that can be used to implement the desired dynamics, then the generalized Landauer bound cannot be achieved. The minimal integrated EP that must arise due to such constraints has been called the “Landauer loss” wolpert_thermo_comp_review_2019 ; wolpert_thermo_bayes_nets_2019 ; wolpert2018thermo_circuits . As an example, the Landauer loss of the evolution of a composite system is nonzero if its subsystems are required to evolve independently of one another wolpert2018thermo_circuits ; wolpert_thermo_comp_review_2019 . So there is no way for such a composite system to achieve the generalized Landauer’s bound. As another example, Landauer loss is also nonzero if the rate matrix of the composite system is constrained so that the subsystems jointly implement a specified Bayes net wolpert_thermo_bayes_nets_2019 ; ito2013information . (See also Boyd:2018aa , which considers similar issues that arise in the thermodynamics of information ratchets when there is only a single heat reservoir.)
Here I focus on continuous time rather than discrete time. So I consider the minimal EP rate of a composite system due to constraints on the form of the system’s rate matrix, which I call the Landauer loss rate.
To investigate the Landauer loss rate of composite systems, I model them as multipartite processes, in which each subsystem evolves according to its own rate matrix horowitz_multipartite_2015 . Importantly, often the rate matrix of any subsystem in a multipartite process will only depend on the states of a limited set of other subsystems, which are jointly called the “neighborhood” of horowitz_multipartite_2015 . Moreover, in general the neighborhoods of different subsystems will have non-empty overlaps, i.e., it may be that the rate matrices of two subsystems both depend on the state of the same, third subsystem, . Loosely speaking, the global structure of such overlaps among the neighborhoods of the subsystems can be viewed as a “continuous-time version” of a Bayes net nodelman2002continuous ; nodelman2012expectation ; el2012continuous .
In this paper I derive two lower bounds on the Landauer loss rate in terms of this global structure of the neighborhood overlaps. Both of these lower bounds are information-theoretic, in the sense that both of them only involve Shannon entropies and their time-derivatives. The first bound is based on applying the inclusion-exclusion principle to the overlaps of the neighborhoods. The second is based on constructing counterfactual rate matrices, in which all subsystems outside of a particular neighborhood are held fixed while those inside the neighborhood are allowed to evolve. This second bound involves quantities which are related both to the “learning rate” of stationary bipartite systems barato_efficiency_2014 ; Brittain_2017 ; hartich.barato.seifert.sensory.capacity.2016 and more generally to the “information flow” among subsystems horowitz2014thermodynamics ; horowitz_multipartite_2015 . Importantly, both of these lower bounds can be strictly greater than zero even if there is no sense in which the subsystems evolve “independently of one another”, or in a “modular” manner (as they do in all previous work on Landauer loss).
Ii Terminology and notation
I write for the cardinality of any set . I sometimes write the Kronecker delta as or even rather than as , for legibility.
Throughout this paper I will assume we have a specified set of subsystems, , with finite state spaces . I write
to indicate a vector in the joint space of allsubsystems, . For any , indicates the vector of all components of other than those of the subsystems specified in .
I write a distribution over a set of values at time as , and write the simplex of all such distributions as . I write the associated Shannon entropy as , , or(sometimes called the“total correlation”) as
Mutual information is the special case of multi-information where there are two random variables cover_elements_2012 .
In this paper I consider the case where the joint state of the subsystems evolves according to a multipartite CTMC during time interval . So there is a set of time-varying stochastic rate matrices, where for all , if , and where the joint dynamics over is governed by the master equation
The marginal distribution of each subsystem evolves as
due to the multipartite nature of the process 111To see this, note that if , then the only way for to be nonzero is if and . If instead , can differ from . However, if then the sum over in Eq. 4 runs over all values of . By normalization of the rate matrix , that sum must equal zero.. Eq. 5 shows that in general, the marginal distribution will not evolve according to a CTMC over the set of distributions defined on .
For any set , and any set of rate matrices , I define the rate matrix given by windowing onto as
Note that this is a properly normalized rate matrix.
In addition, for each subsystem , I write for any set of subsystems at time such that we can write
for an appropriate set of functions . (In the degenerate case where does not evolve inder , no matter what value has, I take .) Following horowitz_multipartite_2015 , I refer to the elements of as neighbors of at time , and refer to the full set as a neighborhood of at . As shorthand, if is a neighborhood under rate matrix , I will write
so that .
Note that the neighbor relation is not symmetric. I will refer to a neighbor of a subsystem as a minimal neighbor if is a member of the minimal neighborhood of . (So if is a minimal neighbor of , then depends on the value of .) In addition, for any neighborhood of some , and any ,
where the last equality follows by normalization of each of the rate matrices . Therefore for any neighborhood of subsystem .
Iii Localized neighborhood sets
I say that a set of neighborhoods of a multipartite CTMC is localized if for all where subsystem is a neighbor of subsystem which in turn is a neighbor of , is also a neighbor of . I assume from now on that the set of neighborhoods being discussed is a localized set 222We can always construct a localized set of neighborhoods for any given multipartite CTMC by taking the transitive closure of the minimal neighbor relation, i.e., by defining the neighborhood of any subsystem to be the set of all subsystems such that there is a sequence of subsystems where is a minimal neighbor of , is a minimal neighbor of , etc., ending with . This implies that for all , at all , the neighborhood of contains the neighborhoods of all of ’s neighbors. So the intersection of any two neighborhoods is also a neighborhood. This means that two subsystems and can co-evolve in a statistically coupled manner, even though is independent of and vice-versa.
I will use the term neighborhood structure to refer to any family of sets, , where each is a union of neighborhoods, and where . As an example, barato_efficiency_2014 considers entropy production in a special type of bipartite system. This system has an “internal” and an “external” subsystem, where the external subsystem is its own neighborhood, but by itself, the internal system is not a neighborhood. So the (localized) neighborhood structure is unique in their scenario. In general though there will be more than one neighborhood structure for any localized set of neighborhoods, since in general there will be more than one set of unions of neighborhoods that covers .
Any neighborhood structure specifies an associated set of sets,
where for all , is the set of all intersections of of the sets in . So writing them out explicitly,
and so on, up to . Note that every element of is itself a union of neighborhoods, due to the definition of neighborhood. Sometimes I will abuse terminology, and refer to the set of sets specified by some as a “neighborhood structure”. Also, when is obvious by context, I will sometimes just write rather than .
For any function , I define the associated inclusion-exclusion alternating sum (or just “in-ex sum” for short) as
In particular, given any neighborhood structure and any fixed distribution over , there is an associated real-valued function mapping any to the entropy of the marginal distribution over the (joint state of the) subsystems in . So using to indicate that function,
I will refer to as the in-ex information.
As an example, if consists of two subsets, , with no intersection, then the in-ex information is just
the mutual information . As another example, if consists of all singletons ,
then the in-ex information is the multi-information of the separate random variables.
However, in contrast to multi-information — but just like some other extensions of mutual information to more
than two variables (e.g., multivariate information, interaction information, etc. mcgill1954multivariate ; ting1962amount ) —
in some situations the in-ex information can be negative
333As an example, suppose , and label the subsystems as .
Then take to have four elements, and
. (So the first element consists of all subsystems whose label involves a ,
the second consists of all subsystems whose label involves a , etc.).
Finally, suppose that with probability ,
the state of every subsystem is the same. Then if the probability distribution of that identical
, etc.). Finally, suppose that with probability
, the state of every subsystem is the same. Then if the probability distribution of that identical state is, the in-ex information is . .
It is shown in Appendix A that at any time , for any union of neighborhoods , evolves as a CTMC with rate matrix :
So a union of neighborhoods evolves according to a self-contained CTMC, in contrast to the case of a single subsystem (cf. Eq. 5).
Given this, define the expected EF rate of any at time as
Make the associated definition that the expected EP rate of any at time is
which I will often write as for short.
I refer to as a local EP rate, in contrast to the global EP rate, , defined by setting in Eq. 19 to . For any , . (This follows from the fact that has the usual form of an EP rate of a single system.) In addition, that lower bound of is achievable, e.g., if at time for all . Note though that local EP rates differ from the EP-like quantities analyzed in horowitz2014thermodynamics ; horowitz_multipartite_2015 .
Iv Thermodynamics of localized set of neighborhoods
iv.1 EP Bounds involving in-ex sums
Appendix B contains a proof that the global EP rate is
where all probability distributions in the expansions of the two ’s are implicitly evaluated at , and the neighborhood structure is fixed to the structure at time . Note that by expanding the term in Eq. 20, collecting sums, and combining with Eq. 17, we see that global EF equals the in-ex sum of EF rates:
Eq. 20 is the first major result of this paper. It is illustrated in the following two examples.
Suppose that every neighborhood in is a distinct subsystem in . So there are no overlaps among the neighborhoods. Physically, this models any scenario where a set of “subsystems” evolve in isolation from one another. This process is called a subsystem process wolpert_thermo_bayes_nets_2019 ; wolpert_thermo_comp_review_2019 . In a subsystem process, Eq. 20 takes the form
So the in-ex information is just the multi-information in this case. If we now integrate Eq. 23 over the time interval , we get a formula for cumulative EP of the joint system:
where is the expected change in the multi-information among the subsystems over the entire duration of the process. This formula for the total EP in a subsystem process was first derived in wolpert_thermo_bayes_nets_2019 . (See also wolpert_thermo_comp_review_2019 ; wolpert2018thermo_circuits .)
As an example, Eq. 25 holds in a subsystem process, where in fact for every , for every , . Less trivially, Eq. 25 also holds if at a minimum, every union of neighborhoods for some has a positive EF rate, ejecting heat from the system into the environment.
When Eq. 25 holds, due to the non-negativity of EP rates of all , is bounded below by
Note that this lower bound on the global EP rate is a purely information-theoretic expression.
iv.2 EP Bounds involving windowed derivatives
We can construct other lower bounds on the global EP rate that are also purely information-theoretic, like Eqs. 26 and 23, but that hold unconditionally, and in addition are guaranteed to be non-negative.
To do this, it will be useful to introduce some shorthand. First, for any (not necessarily a union of neighborhoods), the -(windowed) derivative of the conditional entropy of given under rate matrix is defined as
I write this as just when and are obvious from the context 444More generally, given any function , the -(windowed) derivative of is defined as . See Eq. 4 in horowitz_multipartite_2015 . .
The expression on the RHS of Eq. 27 specifies the instantaneous dynamics of the conditional entropy of given , but under a counterfactual rate matrix, that does not change the state of , only . So it captures how the statistical coupling between and diminishes with time, if is not allowed to change. More precisely, in Appendix C it is shown that if is a union of neighborhoods, then is the derivative of the negative mutual information between and — under the counterfactual rate matrix 555In horowitz2014thermodynamics ; horowitz_multipartite_2015 , the -derivative of the mutual information between and , , is interpreted as the “information flow” from to . However, in the scenarios considered below, will be a union of neighborhoods, and therefore none of the subsystems in will evolve in a way directly dependent on any subsystem in (nor vice-versa). So the fact that will not indicate that information in concerning “flows” from back to the subsystems in in any sense. (Note that reflects probability currents between joint states of all subsystems, not between states of separate subsystems.) In the current context, it would be more accurate to refer to as the “forgetting rate” than as the “negative information flow”.. This in turn means that if is a union of neighborhoods, then is non-positive.
It will also be useful to define
which I will abbreviate as when the set of subsystems is clear from the context. Since is the global EP rate under the counterfactual rate matrix , it is non-negative. Note that even if is a union of neighborhoods, , in general differs from , since we can use Eq. 46 to establish that
In Appendix D it is shown that for any set of subsystems that are a union of neighborhoods under , and any union of neighborhoods ,
(where the second lines uses the fact that windowing to is the same as windowing to ). This is the second major result of this paper. In particular, by taking and then rewriting as , we see that for any union of neighborhoods ,
As claimed above, the RHS of Eq. 33 provides a purely information-theoretic, non-negative lower bound on the Landauer loss rate, which applies unconditionally.
As a simple example of this result, consider again the analysis of a special type of bipartite process in barato_efficiency_2014 . Suppose we set to contain only what in that paper is called the “external” subsystem. Then if we also make the assumption of that paper that the full system is in a stationary state,
where the RHS is called the “learning rate” of the internal subsystem about the external subsystem Brittain_2017 . Given this, Eq. 33 above is equivalent to Eq. 7 of barato_efficiency_2014 . However, Eq. 33 bounds the global EP rate of the system considered in barato_efficiency_2014 even if the system is not in a stationary state, which need be the case with the learning rate 666Recall from the discussion above of the scenario considered in barato_efficiency_2014 that while the external subsystem is its own neighborhood, the internal subsystem is not. This means that in general, if it is not in a stationary state, then the learning rate of the internal subsystem about the external subsystem (as defined in barato_efficiency_2014 ) cannot be expressed as with being a neighborhood..
Alternatively, suppose that again is a union of neighborhoods under , and that some set of subsystems is a union of neighborhoods under (a localized neighborhood structure of) . Then since the term in Eq. 32 is a global EP rate over under rate matrix , we can again feed Eq. 30 into Eq. 32 to get
Depending on the full neighborhood structure, we may be able to combine Eqs. 37 and 35 into an even larger information-theoretic lower bound on the global EP rate. This is illustrated in the following example.
Suppose is a set of four subsystems, labeled . Suppose as well that the (localized) neighborhood structure under is the three sets of subsystems, . Physically, this could correspond to a situation where there are three devices, , and , with internal state spaces and , respectively. Device is continually observing as evolves, in that the dynamics of is directly dependent on . At the same time, that observation is continually used to update the other variable in , , which contains an age-weighted average of all the observations that has made about . While all this is going on, is directly acting on , by modifying the value .
Take and . Note that the four sets form a localized neighborhood structure of
since is independent of both and . So is a member of a (localized) neighborhood structure of , which means that we can apply Eq. 36.
The first term in Eq. 36, , is the local EP rate that would be jointly generated by the set of three subsystems , if they evolved in isolation from the other subsystem, under the self-contained rate matrix
The third term in Eq. 36 is the local EP rate that would be jointly generated by the two subsystems , if they evolved in isolation from the other two subsystems, but rather than do so under the rate matrix , they did so under the rate matrix given in Eq. 38. (Note that if , unlike .)
The fourth term in Eq. 36 is the global EP rate that would be generated by evolving all four subsystems under the rate matrix given by windowing onto , i.e., the rate matrix in Eq. 38. This equals the third term.
All of these three terms are non-negative. However, none of them are information-theoretic, in the sense that all of them depend on more than just derivatives of Shannon entropies. In contrast, the remaining terms are also non-negative — but in addition depend only on derivatives of Shannon entropies. Specifically, the second term in Eq. 36 is the negative of the derivative of the mutual information between the joint random variable and , under the rate matrix
Next, since , the fifth term is negative of the derivative of the mutual information between and , under the rate matrix given by windowing onto , i.e., under the rate matrix in Eq. 38.
Finally note that we also have a neighborhood which is a proper subset of both and . So, for example, we can plug this into Eq. 30 to expand the first term in Eq. 36, , replacing it with the sum of three terms. The first of these three new terms, , is the local EP rate generated by subsystem evolving in isolation from all the other subsystems. The second of these new terms, , is the EP rate that would be generated if the set of three subsystems evolved in isolation from the remaining subsystem, , but under the rate matrix
While non-negative, these two new terms are not information-theoretic. In contrast, the third new term is negative of the derivative of the mutual information between and , under the rate matrix . This is both non-negative and information-theoretic.
It is important to realize that the neighborhood structure underlying these bounds on the Landauer loss rate can change with time. To illustrate this, return to the scenario from the beginning of this example, involving three devices. It may be that at some time stops acting on , and a new neighborhood structure forms, in which uses , the final value of its running average of observations of ’s state, to govern how it acts on , e.g., in an attempt to undo the earlier actions of on . (A new neighborhood structure capturing this could be , for example.)
There are many directions for future work. In particular, it was recently shown how to use the discrete-time Landauer loss of subsystem processes to derive novel fluctuation theorems and thermodynamic uncertainty relations for physical systems that implement any given Bayes net wolpert_thermo_bayes_nets_2019 . It will be interesting to see if the same basic idea can be used with the bounds on continuous-time Landauer loss rate derived above, to derive novel fluctuation theorems and thermodynamic uncertainty relations for physical systems that implement any given multipartite process.
I would like to thank Sosuke Ito, Artemy Kolchinsky, Kangqiao Liu, Alec Boyd, Paul Riechers, and especially Takahiro Sagawa for stimulating discussion. This work was supported by the Santa Fe Institute, Grant No. CHE-1648973 from the US National Science Foundation and Grant No. FQXi-RFP-IPW-1912 from the FQXi foundation. The opinions expressed in this paper are those of the author and do not necessarily reflect the view of the National Science Foundation.
- (1) Andre C. Barato, David Hartich, and Udo Seifert, Efficiency of cellular information processing, New Journal of Physics 16 (2014), no. 10, 103024.
- (2) Alexander B Boyd, Dibyendu Mandal, and James P Crutchfield, Thermodynamics of modularity: Structural costs beyond the landauer bound, Physical Review X 8 (2018), no. 3, 031036.
- (3) Rory A Brittain, Nick S Jones, and Thomas E Ouldridge, What we learn from the learning rate, Journal of Statistical Mechanics: Theory and Experiment 2017 (2017), no. 6, 063502.
- (4) Thomas M. Cover and Joy A. Thomas, Elements of information theory, John Wiley & Sons, 2012.
- (5) Giovanni Diana and Massimiliano Esposito, Mutual entropy production in bipartite systems, Journal of Statistical Mechanics: Theory and Experiment 2014 (2014), no. 4, P04010.
- (6) Tal El-Hay, Nir Friedman, Daphne Koller, and Raz Kupferman, Continuous time markov networks, arXiv preprint arXiv:1206.6838 (2012).
- (7) Massimiliano Esposito and Christian Van den Broeck, Three faces of the second law. i. master equation formulation, Physical Review E 82 (2010), no. 1, 011143.
- (8) D. Hartich, A. C. Barato, and U. Seifert, Stochastic thermodynamics of bipartite systems: transfer entropy inequalities and a Maxwell’s demon interpretation, Journal of Statistical Mechanics: Theory and Experiment 2014 (2014), no. 2, P02016.
- (9) David Hartich, Andre C. Barato, and Udo Seifert, Sensory capacity: An information theoretical measure of the performance of a sensor, Phys. Rev. E 93 (2016), 022116.
- (10) Jordan M. Horowitz, Multipartite information flow for multiple Maxwell demons, Journal of Statistical Mechanics: Theory and Experiment 2015 (2015), no. 3, P03006.
- (11) Jordan M Horowitz and Massimiliano Esposito, Thermodynamics with continuous information flow, Physical Review X 4 (2014), no. 3, 031015.
- (12) Sosuke Ito and Takahiro Sagawa, Information thermodynamics on causal networks, Physical review letters 111 (2013), no. 18, 180603.
- (13) William McGill, Multivariate information transmission, Transactions of the IRE Professional Group on Information Theory 4 (1954), no. 4, 93–111.
Uri Nodelman, Christian R Shelton, and Daphne Koller,
Continuous time bayesian networks
, Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., 2002, pp. 378–387.
- (15) , Expectation maximization and complex duration distributions for continuous time bayesian networks, arXiv preprint arXiv:1207.1402 (2012).
- (16) To see this, note that if , then the only way for to be nonzero is if and . If instead , can differ from . However, if then the sum over in Eq. 4 runs over all values of . By normalization of the rate matrix , that sum must equal zero.
- (17) We can always construct a localized set of neighborhoods for any given multipartite CTMC by taking the transitive closure of the minimal neighbor relation, i.e., by defining the neighborhood of any subsystem to be the set of all subsystems such that there is a sequence of subsystems where is a minimal neighbor of , is a minimal neighbor of , etc., ending with .
- (18) As an example, suppose , and label the subsystems as . Then take to have four elements, and . (So the first element consists of all subsystems whose label involves a , the second consists of all subsystems whose label involves a , etc.). Finally, suppose that with probability , the state of every subsystem is the same. Then if the probability distribution of that identical state is , the in-ex information is .
- (19) More generally, given any function , the -(windowed) derivative of is defined as . See Eq.+.1667em4 in horowitz_multipartite_2015 .
- (20) In horowitz2014thermodynamics ; horowitz_multipartite_2015 , the -derivative of the mutual information between and , , is interpreted as the “information flow” from to . However, in the scenarios considered below, will be a union of neighborhoods, and therefore none of the subsystems in will evolve in a way directly dependent on any subsystem in (nor vice-versa). So the fact that will not indicate that information in concerning “flows” from back to the subsystems in in any sense. (Note that reflects probability currents between joint states of all subsystems, not between states of separate subsystems.) In the current context, it would be more accurate to refer to as the “forgetting rate” than as the “negative information flow”.
- (21) Recall from the discussion above of the scenario considered in barato_efficiency_2014 that while the external subsystem is its own neighborhood, the internal subsystem is not. This means that in general, if it is not in a stationary state, then the learning rate of the internal subsystem about the external subsystem (as defined in barato_efficiency_2014 ) cannot be expressed as with being a neighborhood.
- (22) Juan MR Parrondo, Jordan M Horowitz, and Takahiro Sagawa, Thermodynamics of information, Nature Physics 11 (2015), no. 2, 131–139.
- (23) Takahiro Sagawa, Thermodynamic and logical reversibilities revisited, Journal of Statistical Mechanics: Theory and Experiment 2014 (2014), no. 3, P03025.
- (24) Udo Seifert, Stochastic thermodynamics, fluctuation theorems and molecular machines, Reports on Progress in Physics 75 (2012), no. 12, 126001.
- (25) Sagawa T. Shiraishi, N., Fluctuation theorem for partially masked nonequilibrium dynamics, Physical Review E (2015).
- (26) HU Kuo Ting, On the amount of information, Theory of Probability & Its Applications 7 (1962), no. 4, 439–447.
- (27) Christian Van den Broeck and Massimiliano Esposito, Ensemble and trajectory thermodynamics: A brief introduction, Physica A: Statistical Mechanics and its Applications 418 (2015), 6–16.
- (28) David Hilton Wolpert, Thermodynamic uncertainty relations and fluctuation theorems for bayes nets, arXiv preprint arXiv:1911.0270 (2019).
- (29) David Hilton Wolpert and Artemy Kolchinsky, The thermodynamic costs of circuits, arXiv preprint arXiv:1806.04103v3 (2018).
- (30) D.H. Wolpert, The stochastic thermodynamics of computation, Journal of Physics A: Mathematical and Theoretical (2019).
Appendix A Proof of Eq. 15