I Introduction
Let
be a sequence of random variables forming a firstorder Markov chain on a finite set
with transition probability matrix
, where for all and .We think of as the “uncontrolled” or “base” chain. If is irreducible and aperiodic, then there exists a unique stationary distribution, viewed as a row vector:
(1) 
Let the initial state have distribution . Then, denoting , the distribution of is:
(2) 
Let
be some probability distribution on
viewed as a row vector. We study the nearest Markov chain transition matrix to having as its stationary distribution:(3) 
where is the KullbackLeibler (KL) divergence rate between Markov chains with transition matrices and [1] ^{1}^{1}1This expression is independent of for aperiodic, irreducible .:
(4) 
where we note that is independent of the initial distribution .
We think of as the “controlled” or “driven” chain and as the cost of control per unit time – the power. We further consider the analogous question for the case of a continuoustime Markov chain .
This setting is inspired by the following thought experiment due to Feynman [2]: a person holds a heavy bag above the floor for an hour and gets tired. The net work done on the object is zero^{2}^{2}2Since work is the product of force and distance displaced, the latter of which is zero., so why does she get tired? A table could hold the same bag indefinitely without an energy source, and so could the person if she were frozen solid, the bag hanging on her stiff, lifeless limb. The latter observation implicates the microscopic dynamics of muscles as key to this question. A toy model for the motion of striated muscle fibers – see [3, 4] for an extended discussion – is of a random walk in a periodic energy potential^{3}^{3}3The myosin protein joining the fibers together is doing the random walking, the energy potential having periodicallyspaced minima corresponding to discrete steps along the fiber.. Attachment of the heavy bag pulls on the muscle fibers, biasing the random walk in the direction of gravity, tilting the energy potential. The person must use chemical energy^{4}^{4}4The hydrolysis of ATP molecules. to debias the random walk in such a way that the bag is held at the desired height above the floor. If the person is frozen solid, then the underlying random walk stops^{5}^{5}5Or slows down a lot, as lowering the temperature reduces the transition rate between the potential’s energy minima., and so chemical energy is no longer used to hold the bag. We ask: what is a lower bound on the power to hold the bag?
We recast the above story as optimization problem (3). The state space corresponds to the possible configurations of the system (the position of myosin along a fiber and its internal state). The uncontrolled Markov chain corresponds to the underlying fluctuations of the myosin molecule along a filament and the controlled chain corresponds to chemically driving the system.
The control goal is macroscopic: the net force the person exerts on the bag is the sum of the forces due to each microscopic subsystem (myosin protein). We do not get to control each subsystem separately, but can control them all in the same way, so that each subsystem corresponds to a trajectory drawn from Markov chain independently of other subsystems. This notion of macroscopic control is reflected in the KL divergence cost (4), which is stated in terms of the probability distribution (2) over microscopic trajectories, rather than in terms of a single trajectory.
Our choice of the KullbackLeibler divergence as the control cost function is motivated by this quantity’s appearance in thermodynamics as proportional to the free energy difference from the equilibrium distribution over trajectories, which in turn lower bounds the work to prepare a nonequilibrium distribution over trajectories. KL divergence control (the microscopic pertrajectory setting) was introduced in the reinforcement learning literature by Todorov
[5, 6] and has connections to data compression; we discuss this and other motivations for our work in section II. The problem of maintaining a target nonequilibrium distribution has been studied recently by [7, 8], using a different notion of cost. We discuss our work’s relation to prior works in section IIE.Minimizing the KL divergence with respect to the first argument – computing the Iprojection – connects this setting to large deviations theory. Indeed, most of the minimumcost controlled chains we compute first appeared in the computation of rate functions for large deviations of the empirical measure of both discrete and continuoustime Markov chains [9, 10, 1]. The novelty of our work lies in relating these results to the minimumpower control setting – showing the minimized KL divergence to be the minimum power to “hold” a nonequilibrium distribution; in aggregating related problem statements – in continuous and discrete time and for reversible base chains; in computing some of these minimizations more explicitly than previously available; and in computing these minimizations for a few common examples like the birthanddeath chain and the twostate chain.
This work is organized as follows. Section II motivates the use of KL divergence as energy cost by drawing from information theory and optimization settings and contains definitions of this cost function in discrete and continuous time. Section III shows how this energy cost of holding a given target distribution may be analytically minimized. Section IV contains several examples that apply our theory to calculate the minimumpower controlled chain, including a birthanddeath chain which serves as a toy model of the muscular fiber, addressing the motivating question of Feynman. We conclude with a summary and outlook in section V.
We release code for computing the minimumcost chains in this work at https://github.com/dmitrip/controlledMC.
Ii KullbackLeibler divergence rate as the cost of control of Markov chains
We motivate the KL divergence rate between Markov chains in both discrete and continuous time as the cost function lower bounding the power in the bagholding thought experiment. We present a thermodynamics perspective in subsection IIA and an equivalent perspective due to Todorov [5, 6] of a Markov decision process with log ratio cost function in subsection IIC. We summarize known expressions for the KL divergence between Markov chains in discrete and continuous time in subsection IIB and IID respectively. Finally, IIE places this work in context with related work.
Iia KL divergence in thermodynamics
We summarize briefly the appearance of the KL divergence in measuring work in statistical mechanics. Below, let denote the KullbackLeibler (KL) divergence between distributions and on finite set and let denote the entropy of distribution . Let
(5) 
be the Boltzmann distribution on , where is the energy function (sometimes called the energy potential or internal energy), is the inverse temperature, and is the partition function. Denote the free energy of distribution by [11]:
(6)  
(7)  
(8) 
where the expectation is over random variable with distribution . Then is minimized at equilibrium , so that .
In thermodynamics [12, 13] the work to prepare the “controlled” distribution starting from the “base”, equilibrium distribution (also known as the work on the system) is at least the free energy difference:
(9) 
As is customary in this setting, there is in the background a notion of a stochastic process transforming initial states into final states, and the work in (9) is an average over realizations of this process. In Appendix (A) we provide a physical example in the spirit of the Szilard’s engine thought experiment in thermodynamics, for which the KL divergence does emerge as the work done. In the bagholding thought experiment, there is a large collection of independent myosin systems, and the total work is the sum of the works on each system. We imagine the number of microscopic systems to be large enough that fluctuations about the average work per subsystem are small, so it is this average work that’s our object of study.
The KL divergence cost is familiar in data compression, where the “energy” of symbol drawn from distribution is . Any compression scheme must use at least bits to encode a sample from [14, 15] on average over draws from distribution . If we use a compression scheme that instead uses bits to encode symbol – a mismatched code – then we would pay extra bits per symbol on average. Section IIB contains analogous remarks for compressing samples drawn from Markov chain distributions.
IiB Markov chains in discrete time
In the Markov chain control setting, we apply the preceding picture with alphabet (trajectories of length ) and with Markov chain distributions on with a desired marginal distribution . We consider the continuous time setting in section IID.
A discrete time Markov chain distribution on the set is the Boltzmann distribution with energy function parametrized by the stochastic transition matrix and the initial distribution vector (obtained by taking the logarithm of the rightmost quantity in (2)):
(10) 
where . Then , where (5). Given another transition matrix and initial distribution , the work to prepare Markov chain distribution starting from the Boltzmann distribution is lower bounded by the free energy difference (9). In the limit , the work per time step – the power – is lower bounded by
(11)  
(12)  
(13)  
(14)  
(15) 
where is the stationary distribution of transition matrix and the last equality defines the KL divergence rate [1] between Markov chains with transition matrices and .
In data compression, a sample can be compressed on average to at least bits [15], where is the entropy rate of Markov chain with transition matrix :
(16) 
Encoding samples from distribution with respect to a mismatched code based on distribution incurs an average cost per unit time of at least extra bits.
IiC Log loss action cost
Another path to the same optimization problem (17) (KL divergence as a lower bound on the work to maintain a nonequilibrium distribution) is in terms of a Markov decision process with log loss action cost, a setting introduced by Todorov [5, 6]. Let be the uncontrolled chain and let be the controlled chain. Let be the microscopic cost paid when a transition is made from to when the controller chooses transition probability matrix . KL divergence control amounts to using the log likelihood ratio:
(18) 
If we view the rows of and as Boltzmann distributions with different energy potentials – that is, if we choose energy functions , such that and – then the microscopic cost is the difference in energies: .
If , then let the cost be the expected microscopic cost (the average cost paid per microscopic system):
(19)  
(20) 
Thus is a weighted KL divergence between the rows of transition matrices and . We are interested in macroscopic control of , rather than microscopic control of , so our setup differs from the setting introduced by [5, 6]: we average the control cost over , so there is no randomness in our setting. Finally for irreducible, aperiodic transition matrix , we have the identity
(21) 
where is the stationary distribution of . Minimizing the cost with respect to such that is optimization problem (17).
IiD Markov chains in continuous time
The setup of section IIB has a natural counterpart for continuoustime Markov chains. Let denote the transition rate matrix of the uncontrolled continuoustime Markov chain , where henceforth the overbar notation corresponds to rate matrices. Let be the controlled rate matrix and let . Then
(22) 
where denotes the matrix exponential. Note that every rate matrix satisfies for and , so the row sums of are 0. Conditioned on
, the time until the next jump is exponentiallydistributed with a mean of
, and the probability to jump to is proportional to for .The natural notion of KL divergence rate between transition rate matrices and is [16, 10, 17, 18] the limiting log likelihood ratio, analogous to (13):
(23)  
(24) 
where denotes the likelihood under rate matrix and initial distribution , and where is the stationary distribution of rate matrix ^{6}^{6}6That is, . Equivalently, for all .. The quantity in the second summation in (24
) is the KL divergence between two Poisson distributions with means
and .The optimization problem analogous to (17) in continuous time is:
(25) 
IiE Comparison to prior work
Recent work [7, 8] considers the question of the minimum power used to maintain a nonequilibrium state. Their setting uses a different notion of cost than we do and also makes some restrictions about the base and controlled chains (they work in continuous time, assume that the base chain is reversible, and only allow controlled chains such that for all – this corresponds to the biochemical mechanism of adding transitions with nonnegative rates). [7, 8] minimize the entropy production rate among all controlled chains with the desired target distribution and find that “fast control is optimal”: there is in general no optimally controlled chain, but given any chain that has the target distribution , we can come arbitrarily close to the minimum entropy production bound by speeding up arbitrarily much^{7}^{7}7That is, using as the controlled chain and letting . This corresponds to the statement in [7] than the “added edges (should) operate much faster than the equilibrium transitions”. (while incurring an arbitrarily large KL divergence cost according to our metric).
The difference between the two notions of cost – KL divergence rate in this work and entropy production rate in [7, 8] is the difference between total energy used and the efficiency with which that energy is used as measured by entropy production rate. The very fast controlled chain of [7, 8] uses a lot of energy efficiently, while our chain minimizes energy use by the controller, but not the efficiency. A consequence of this difference is that our optimal controlled chain (25) depends on the uncontrolled chain (see section III), while the very fast closetooptimal chain of [7, 8] does not, except in the requirement that it be much faster than .
[5, 6] introduced the KL divergence control setting. [5, 6] uses “microscopic” control cost, assigning a cost to a trajectory rather than a distribution over trajectories. Similarly, the control goal in [5, 6] is microscopic (to reach a certain subset of the state space ), rather than macroscopic (to maintain a target distribution over ).
[19] considers the problem of erasing a bit of information encoded in the stationary distribution of a twostate continuous time Markov chain and uses the KL divergence to measure the cost of control, as does our work. Whereas our control goal is to hold a target distribution and minimize the cost per unit time, [19]’s control goal is to have (in the notation of section IID) by a fixed time and to minimize the total cost used to achieve this. Consequently, [19] uses a timevarying controlled chain, while ours is constant in time.
Iii Minimumpower controlled chains
In this section we minimize the power used to hold a desired nonequilibrium stationary distribution .
Theorem 1.
(Minimumpower chain) Let be a distribution on finite set with for all . Let and denote the stationary distributions of discrete and continuoustime chains and , respectively.

(Discrete time) Let be an irreducible, aperiodic transition probability matrix (the uncontrolled discretetime chain), let , and let denote the minimumpower controlled chain with the desired stationary distribution :
(26) where the minimum is over all transition probability matrices with the desired stationary distribution and where is as defined in (15). Then exists, is unique, and satisfies for all :
(27) where are realvalued constants satisfying the recursive relations:
(28) (29) 
(Continuous time) Let be a transition rate matrix (the uncontrolled continuoustime chain) with irreducible and aperiodic, and let denote the minimumpower controlled chain with the desired stationary distribution :
(30) where the minimum is over all transition rate matrices with the desired stationary distribution and where is as defined in (24). Then exists, is unique, and satisfies:
(31) where are realvalued constants satisfying the recursive relations:
(32)
Existence and uniqueness of (26) follow as a special case of Lemma 1 of [1]. Existence and uniqueness of (31) were shown in [10]. We prove expressions (27) and (31) for and by setting up a Lagrange multiplier optimization problem, where are Lagrange multipliers. See Appendix B for proof details.
The recursive relations (28), (29), and (32) enable an iterative computation of the chains (26) and (31). In the continuoustime case, for example, we initialize to some value and then use relation (32) to compute as a function of at the th iteration until numerical convergence. The example in section IV is computed in this way.
The chains and are the Iprojections of the chains and on the set of discrete and continuoustime Markov chains, respectively, with a fixed stationary distribution. The discretetime case is presented in [9, 1] and the continuoustime case in [10], where and arise as large deviations rate functions for the empirical marginal distribution. To our knowledge Theorem 1 presents the most explicit characterization of the Iprojection in terms of the Lagrange multipliers.
We next specialize our results to the case of reversible uncontrolled Markov chains, a case important in equilibrium thermodynamics. Let denote the timereverse of a transition probability matrix . That is:
(33) 
where is the stationary distribution. Then ; if , then for all . Analogously, in continuous time, the timereverse of a transition rate matrix satisfies for all . A chain is reversible if (analogously, in continuous time).
Theorem 2.
(Reversible uncontrolled chain) Let notation be as in the statement of Theorem 1.

In continuous time, satisfies for :
(34) and
(35)
Proof.
1) We can check that if , then for all . Suppose that is not reversible. Let . Then is reversible and . Since is strictly convex in , we have , contradicting the optimality of . Another proof: suppose that is not reversible, then , contradicting the uniqueness of established in Theorem 1. Therefore is reversible. An analogous argument proves is reversible. 2) follows by using time reversal (33) twice along with the reversibility of and , established in part 1):
(36)  
(37) 
and collecting  and dependent terms to separate sides of the equality to conclude that for some constant for all . Choosing yields the result.
Iv Examples
We conclude with numerical examples of minimumpower controlled Markov chains with a target stationary distribution. The first example is a twostate chain in discrete time and the second example is a birthanddeath chain in continuous and discrete time – a toy model of the muscle fiber thought experiment in the introduction (section I).
Iva Twostate chain in discrete time
Let be a twostate discretetime Markov chain on set and let be our desired nonequilibrium distribution with . All twostate chains are reversible, so we apply Theorem 2 part 2) to compute the minimumpower controlled chain with stationary distribution . A computation shows the minimumpower transition matrix (26) has offdiagonal entries:
(38) 
where the second factor is independent of and
(39) 
The diagonal terms of are such that the row sums are .
IvB Birthanddeath chain
We next present the example of the birthanddeath chain as a toy model of Feynman’s muscle fiber thought experiment (see section I). For detailed models of molecular motors see [20] and [4].
IvB1 Continuous time
Let be the continuoustime birthanddeath chain on set with parameters depicted in Figure 1 for . The chain transitions increments from state to (resp. decrements to ) with rate (resp. ). All other transitions, as well as decrementing from state and incrementing from state , have rate . is reversible and its stationary distribution is, up to normalization, [21]:
(40) 
Let our control objective be to maintain the target distribution
, a geometric distribution on
:(41) 
where . Then applying Theorems 1 and 2 we find the minimumpower controlled chain (30) with stationary distribution to be another birthanddeath chain with increment and decrement rates :
(42) 
The cost per unit time of this birthanddeath chain is (35):
(43) 
If , then as , we have .
Recalling the motivating example of section I, we can think of the birthanddeath chain as a biased random walk, where the random walker tends to spend more time at small values of if . This is a toy model of a myosin protein moving on an actin filament in muscles – a random walk in a tilted energy potential with periodically spaced minima corresponding to discrete steps along the fiber. The energy potential is depicted in Figure 2: a transition from the th energy minimum to the th energy minimum must overcome activation energy , and the reverse transition must overcome activation energy , where is the energy difference between adjacent states. In the bagholding thought experiment, is the gravitational potential energy difference between adjacent states, with state being closest to the ground.
In terms of these energies, the increment and decrement rates are:
(44)  
(45) 
for some constant , where is the inverse temperature. The stationary distribution of the uncontrolled chain is, up to normalization:
(46) 
Let’s write the target nonequilibrium distribution as:
(47) 
with energy difference between adjacent states.
Since the optimal controlled chain is another birthanddeath chain, we can write its parameters (42) in terms of energies and :
(48)  
(49) 
where, using (42), we find the controlled activation energy :
(50) 
The cost per unit time of this birthanddeath chain is (43):
(51) 
where if , then the last factor tends to as .
In the muscle fiber thought experiment (where is the gravitational potential energy difference between adjacent states and state is closest to the ground) if , then target distribution (47) corresponds to imposing a constant force upwards (away from state ) on the randomwalking myosin protein. The control objective is macroscopic: rather than control the microscopic trajectory of a single myosin protein, we imagine controlling a large collection of identical, independent myosin proteins in the same way by imposing the same controlled chain on all myosins; the bagholder’s arm position is determined by an average over the positions of this collection of myosins.
IvB2 Discrete time
Let be the discretetime birthanddeath chain on set with transition probability (resp. ) to increment (resp. decrement) the state from to (resp. ). The stationary distribution of is as in (40), the same as in the continuous time case with transition rates . Let our control objective be to maintain the target distribution , a geometric distribution on (41).
Then in contrast to the continuous time case of section IVB1, the minimumpower controlled chain (26) is not in general a birthanddeath chain. Consider this numerical example: let the target nonequilibrium stationary distribution be as in (41) with :
(52) 
(52) is biased the other way from , assigning most of its mass to large values of if . The square root in (52) makes look more uniform than .
Let , , so that the stationary distribution of is , and . Figure 3 shows the nonzero offdiagonal elements of with threedigit precision; the increment and decrement probabilities vary with state , so is not a birthanddeath chain.
Finally, Figure 4 depicts the time evolution of distribution with , and the cost converging to the minimum power to maintain the nonequilibrium distribution .
Returning to the molecular motor picture, the discretetime chain differs from its continuoustime cousin of section IVB1 in that is not a birthanddeath chain. Thus an optimal discretetime control policy modifies the base birthanddeath chain in a way that depends on state , and so can’t be thought of as corresponding to a constant, stateindependent force upwards (away from state ) as in the continuous time case.
V Discussion
This work derives the minimum power required to maintain a target stationary distribution given uncontrolled Markov chain dynamics in both discrete and continuous time. We relate KL divergencelike penalties from control theory [5, 6] to the power used to control a Markov process, using muscular molecular motors as a guiding example. The problem of minimizing a KL divergence subject to a constrained stationary distribution is familiar from large deviations theory [9, 10, 1]; the novelty of our work is in relating these large deviations results to the thermodynamics of “holding” a distribution, and in computing the minimumcost chains in some important examples: the birthanddeath process in continuous time (a toy model for a muscular molecular motor) and twostate chains in discrete time.
To the best of our knowledge, this is the first time a lower bound on average power consumption has been studied in detail for control of the stationary distribution. [8] study a related quantity, the minimum entropy production rate associated with adding edges (allowing control to increase but not decrease transition rates) to a continuous time Markov chain, but their notion of cost has the interpretation of energy efficiency, as opposed to ours, which is to be interpreted as total energy usage. Unsurprisingly, different notions of cost lead to different optimal controlled chains: the optimal controlled chain of [8] depends on the underlying uncontrolled chain only in the requirement that it be much faster, while our minimumcost controlled chain is a function of the uncontrolled chain; this function is easy to compute (34) in the case of a continuous time, reversible uncontrolled chain, an important case in modeling biological processes.
Acknowledgment
The authors gratefully acknowledge Hideo Mabuchi for suggestions and insightful discussions.
References
 [1] I. Csiszár, T. M. Cover, and B.S. Choi, “Conditional limit theorems under Markov conditioning,” IEEE Trans. Inf. Theory, vol. 33, pp. 788–801, 1987.
 [2] R. P. Feynman, The Feynman lectures on physics. AddisonWesley, 1964, vol. I.
 [3] A. F. Huxley, “Muscle structure and theories of contraction,” Prog. Biophys. Biophys. Chem., vol. 7, pp. 257–318, 1957.
 [4] H. Qian, “The mathematical theory of molecular motor movement and chemomechanical energy transduction,” J. Math. Chem., vol. 27, pp. 219–234, 2000.
 [5] E. Todorov, “Linearlysolvable Markov decision problems,” in Advances in Neural Information Processing Systems 19, 2007, pp. 1369–1376.
 [6] ——, “Efficient computation of optimal actions,” Proceedings of the National Academy of Sciences, vol. 106, pp. 11 478–11 483, 2009.
 [7] J. M. Horowitz, K. Zhou, and J. L. England, “Minimum energetic cost to maintain a target nonequilibrium state,” Phys. Rev. E, vol. 95, p. 042102, 2017.
 [8] J. M. Horowitz and J. L. England, “Informationtheoretic bound on the entropy production to maintain a classical nonequilibrium distribution using ancillary control,” Entropy, vol. 19, p. 333, 2017.
 [9] A. Dembo and O. Zeitouni, Large deviations techniques and applications, 2nd edition. SpringerVerlag Berlin Heidelberg, 2010.
 [10] P. Baldi and M. Piccioni, “A representation formula for the large deviation rate function for the empirical law of a continuous time Markov chain,” Stat. & Prob. Letters, vol. 41, pp. 107–115, 1999.
 [11] M. Mézard and A. Montanari, Information, Physics, and Computation. Oxford University Press, 2009.
 [12] C. Kittel and H. Kroemer, Thermal Physics, 2nd ed. W. H. Freeman and Company, 1980.
 [13] E. L. L.D. Landau, Statistical Physics. Elsevier, 1951, vol. 5.
 [14] C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal, vol. 27, pp. 379–423, 623–656, 1948.
 [15] T. M. Cover and J. A. Thomas, Elements of Information Theory, Second Edition. Hoboken, NJ: John Wiley & Sons, 2006.
 [16] G. Kesidis and J. Walrand, “Relative entropy between Markov transition rate matrices,” IEEE Trans. Inf. Theory, vol. 39, pp. 1056–1057, 1993.
 [17] A. de La Fortelle, “Large deviation principle for Markov chains in continuous time,” Probl. Inf. Trans., vol. 37, pp. 120–139, 2001.
 [18] L. Bertini, A. Faggionato, and D. Gabrielli, “From level 2.5 to level 2 large deviations for continuous time Markov chains,” Markov processes and related fields, vol. 20, pp. 545–562, 2014.
 [19] M. Gopalkrishnan, “A cost/speed/reliability tradeoff to erasing,” Entropy, vol. 18, p. 165, 2016.
 [20] F. Jülicher, A. Ajdari, and J. Prost, “Modeling molecular motors,” Reviews of Modern Physics, vol. 69, pp. 1269–1281, 1997.
 [21] D. A. Levin, Y. Peres, and E. L. Wilmer, Markov chains and mixing times. American Mathematical Society, 2009.
 [22] L. Szilard, “On the decrease of entropy in a thermodynamic system by the intervation of intelligent beings,” Zeitschrift für Physik, vol. 53, pp. 840–856, 1929.
Appendix A Physical example of KL divergence as energy cost
We offer an example of the KL divergence as the cost of sampling from a target distribution given a “base” distribution (see discussion in section II).
Figure 5 presents a slight generalization of the Szilard information engine [22]: molecules of an ideal gas inhabit the space formed by movable pistons indexed by . Let denote the volume beneath the th piston and be the total volume. A molecule of gas is equally likely to be found anywhere within the space beneath the pistons, corresponding to probability distribution on the pistons. Now imagine we add impermeable partitions between the pistons (vertical dashed lines) and move the pistons to new positions (green lines) at constant temperature (perhaps the bottom of the box is in thermal contact with a heat reservoir). The partitions prevent mixing between different pistons during compression; we remove them afterwards. Let be the new volume beneath the th piston, be the new total volume, and be the new piston probability distribution after this deformation.
What is the work used to perform this deformation? The work to isothermally compress an ideal gas is , where is Boltzmann’s constant and is the temperature. A gas molecule occupies the space beneath the th piston with probability before compression, so the expected work to move the pistons is
(53) 
per molecule of gas. If the new pistons positions are such that the total volume is unchanged, then and the work is proportional to the KL divergence . We can imagine a sequence of such gas boxes and deformations, where the predeformation volumes at time are determined by drawing a single molecule from volumes at time , forming a Markov chain with KL divergence control cost.
Appendix B Proof of Theorem 1
Ba Part 1
We wish to solve the following problem:
(54) 
For ease of manipulation, we work with the empirical joint transition probability distribution
(55) 
We can solve (54) by setting up the Lagrangian:
(56)  
(57) 
where is a Lagrange multiplier enforcing normalization of the joint transition probability distribution under , , and , are Lagrange multipliers enforcing the stationary distribution condition . Our solution is a stationary point of the Lagrangian with respect to :
(58) 
Since , then , satisfy:
Comments
There are no comments yet.