The Channel Capacity of Channelrhodopsin and Other Intensity-Driven Signal Transduction Receptors

Biological systems transduce signals from their surroundings through a myriad of pathways. In this paper, we describe signal transduction as a communication system: the signal transduction receptor acts as the receiver in this system, and can be modeled as a finite-state Markov chain with transition rates governed by the input signal. Using this general model, we give the mutual information under IID inputs in discrete time, and obtain the mutual information in the continuous-time limit. We show that the mutual information has a concise closed-form expression with clear physical significance. We also give a sufficient condition under which the Shannon capacity is achieved with IID inputs. We illustrate our results with three examples: the light-gated Channelrhodopsin-2 (ChR2) receptor; the ligand-gated nicotinic acetylcholine (ACh) receptor; and the ligand-gated Calmodulin (CaM) receptor. In particular, we show that the IID capacity of the ChR2 receptor is equal to its Shannon capacity. We finally discuss how the results change if only certain properties of each state can be observed, such as whether an ion channel is open or closed.

Authors

• 11 publications
• 5 publications
• On Sampling Continuous-Time Gaussian Channels

For a continuous-time Gaussian channel, it has been shown that as sampli...
02/07/2020 ∙ by Guangyue Han, et al. ∙ 0

• Linear Noise Approximation of Intensity-Driven Signal Transduction Channels

Biochemical signal transduction, a form of molecular communication, can ...
08/28/2019 ∙ by Gregory R. Hessler, et al. ∙ 0

• Channel Dependent Mutual Information in Index Modulations

Mutual Information is the metric that is used to perform link adaptation...
07/25/2018 ∙ by Pol Henarejos, et al. ∙ 0

• Undecidability of approximating the capacity of time-invariant Markoff channel with feedback, and non-existence of linear finite-letter conditional mutual information character

It is proved that approximating, within an additive constant, the capaci...
04/16/2018 ∙ by Mukul Agarwal, et al. ∙ 0

• Beyond the Channel Capacity of BPSK Input

The paper proposed a method that organizes a parallel transmission of tw...
08/23/2019 ∙ by Bingli Jiao, et al. ∙ 0

• Towards a Non-Stochastic Information Theory

The δ-mutual information between uncertain variables is introduced as a ...
04/26/2019 ∙ by Anshuka Rangi, et al. ∙ 0

• Communication in Plants: Comparison of Multiple Action Potential and Mechanosensitive Signals with Experiments

Both action potentials and mechanosensitive signalling are an important ...
11/12/2019 ∙ by Hamdan Awan, et al. ∙ 0

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Living cells take in information from their surroundings through myriad signal transduction processes. Signal transduction takes many forms: the input signal can be carried by changes in chemical concentration, electrical potential, light intensity, mechanical forces, and temperature, inter alia. In many instances these extracellular stimuli trigger intracellular responses that can be represented as transitions among a discrete set of states [1]. Models of these processes are of great interest to mathematical and theoretical biologists [2].

The “transduction” of the signal occurs through the physical effect of the input signal on the transition rates among the various states describing the receptor. An early mathematical model of this type was the voltage-sensitive transitions among several open and closed ion channel states in Hodgkin and Huxley’s model for the conduction of sodium and potassium ions through the membranes of electrically excitable cells [3]. Presently, many such models are known for signal transduction systems, such as: the detection of calcium concentration signals by the calmodulin protein [4], binding of the acetylcholine (ACh) neurotransmitter to its receptor protein [5], and modulation of the channel opening transition by light intensity in the channelrhodopsin (ChR) protein [6]. In each of these examples the channel may be modeled as a weighted, directed graph, in which the vertices represent the discrete channel states, and the weighted edges represent per capita transition rates, some of which can be modulated by the input signals.

Mutual information, and Shannon capacity, arise in a variety of biological contexts. For example, mutual information may predict the differential growth rates of organisms learning about their environment [7], based on the Kelly criterion [8]. For biological communication systems, achieving a distortion criterion (expressed as mutual information) need not require complicated signal processing techniques; see [9, Example 2]. Moreover, the free energy cost of molecular communication (such as in signal transduction) has a mathematical form similar to mutual information [10], leading to thermodynamic bounds on capacity per unit energy cost (cf. [11]).

Stochastic modeling of signal transduction as a communication channel has considered the chemical reactions in terms of Markov chains [12] and in terms of the “noise” inherent in the binding process [13]

. For simplified two-state Markov models, Shannon capacity of signal transduction has been calculated for slowly changing inputs

[14] and for populations of communicating bacteria [15]. Our own previous work has investigated the capacity of signal transduction: in [16], we obtained the Shannon capacity of two-state Markov signal transduction under arbitrary inputs, and showed that the capacity for multiple independent receptors has the same form [17]. Related channel models have been studied in the information-theoretic literature, such as the unit output memory channel [18], the “previous output is the state” (POST) channel [19, 20]; capacity results for some channels in these classes were recently given in [21].

The present paper focuses on the mutual information and capacity of finite-state signal transduction channels. Generalizing previous results, we provide discrete-time, finite-state channel models for a wide class of signal transduction receptors, giving Channelrhodopsin-2 (ChR2), Acetylcholine (ACh), and Calmodulin (CaM) as specific examples. We also provide an explicit formula for the mutual information of this class of models under independent, identically distributed (IID) inputs (Theorem 1). Subsequently, we consider the continuous time limit as the interval between the discrete-time instants goes to zero, and find a simple closed-form expression for the mutual information (Theorem 2), with a natural physical interpretation. We further give conditions under which our formula gives the Shannon capacity of the channel, namely that there is exactly one transition in the Markov chain that is sensitive to the channel input (Theorem 3), and we use this result to (numerically) find the Shannon capacity of ChR.

The remainder of the paper is organized as follows: in Section II, we give a generalized model for discrete-time, finite-state signal transduction systems; in Section III, we discuss signal transduction as a communication system, deriving expressions for the mutual information and giving our main results; and in Section IV, we discuss the biological significance of the results, as well as the limitations of our analysis.

Ii Model

Ii-a Physical model

Signal transduction encompasses a wide variety of physical processes. For example, in a ligand-gated system, signals are transmitted using concentrations of signaling molecules, known as ligands, which bind to receptor proteins. As another example, in a light-gated system, signals are transmitted using light, where the receptor absorbs photons. Other possibilities exist, such as voltage-gated ion channels. The receptor, often located on the surface of the cell, forms the receiver in the signal transduction system, and conveys (or transduces) the signal across the cell membrane; the receptor is the focus of our analysis.

Signal transduction receptors share a mathematical model: they can be viewed as finite-state, intensity-modulated Markov chains, in which the transition rates between certain pairs of states are sensitive to the input (though other transitions may be independent of the input). Our main examples in this paper focus on ligand- and light-gated receptors. For example, in a ligand-gated system, the binding of the ligand results in a change in the receptor, which then produces second messengers (normally a different species than the ligand) to convey the message to the cell interior. In a light-gated system, the incident photon causes a similar change in the receptor, which may open to allow an ion current to pass to the interior of the cell. In either case, there may be a relaxation process which returns the receptor to the “ready” state, and this process may be independent of the signal; or other processes that are either sensitive to or independent of the signal, depending on the purpose of the receptor.

In the next two sections, we describe the Markov chain model for receptors, both in continuous and in discrete time. Although we focus on ligand- and light-gated receptors, we emphasize that our framework is general enough to include other kinds of receptors.

Ii-B Continuous time: Master equation kinetics

Receptors are finite-state Markov chains. For a receptor with discrete states, there exists a

-dimensional vector of state occupancy probabilities

, given by

 p(t)=[p1(t),p2(t),…,pk(t)], (1)

where represents the probability of a given receptor occupying state at time . The environmental conditions at the receptor, such as light level or ligand concentration, are known as the input .

The chemical kinetics of the receptor are captured by a differential equation known as the master equation [22]. Let represent a matrix of per capita transition rates, where represents the instantaneous rate at which receptors starting in state enter state . It is helpful to visualize the matrix using a graph:

• There are vertices, representing the states; and

• A directed edge is drawn from vertex to if and only if for some .

Changing from one state to another is called a transition, so the graph corresponding to depicts the possible transitions. A transition may be sensitive, i.e. varies as a function of the input , or insensitive, is constant with respect to .

Using , the master equation is given by

 dp(t)dt=p(t)Q(x(t)). (2)

We use the notation from [23]:

• States take a compound label, consisting of a state property and a state number. The state number is unique to each state, but the state property may be shared by multiple states. For example, in each state the receptor’s ion channel might be either open or closed ; the state label means that in state 1 the channel is closed, and means that in state 2 the channel is open. In this paper we use the state number rather than the state property. (Since we show that the state numbers form a Markov chain, in general the state properties form a hidden Markov chain; we discuss this further in Section IV.)

• We assume that rates which are sensitive to the input are directly proportional to the input . For example, is the transition rate from , which is sensitive, while is the transition rate from , which is insensitive.

• The th diagonal element of is written , and is set so that the th row sums to zero (so, if appears in the th row, may depend on ).

Taking sensitive rates to be proportional to the signal is a key modeling assumption; it is satisfied for the examples we consider, but there exist systems in which the signal acts nonlinearly on the rate.

The following three examples illustrate the use of our notation, and give practical examples of receptors along with their transition graphs and rate constants.

Example 1 (Channelrhodopsin-2). The Channelrhodopsin-2 (ChR2) receptor is a light-gated ion channel. The receptor has three states, named Closed (), Open (), and Desensitized (). The channel-open () state is the only state in which the ion channel is open, passing an ion current. The channel-closed () states, and , are distinct in that the receptor is light-sensitive in state , and insensitive in state [6]. The rate matrix for ChR2 is

 Q=⎡⎢⎣R1q12x(t)00R2q23q310R3⎤⎥⎦. (3)

where is the relative intensity. To keep the row sums equal to zero, we set , , and . Fig. 1 shows state labels and allowed state transitions.

Parameter values from the literature are given in Table I.

Example 2 (Acetylcholine). The Acetylcholine (ACh) receptor is a ligand-gated ion channel. Following [5], we model the receptor as a conditional Markov process on five states, with rate matrix

 Q=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣R1q12x(t)0q140q21R2q23000q32R3q340q410q43x(t)R4q45000q54x(t)R5⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦. (4)

There are three sensitive transitions: , , and , which are proportional to ligand concentration . For the purposes of our analysis, we use a range of . Fig. 2 shows the allowed state transitions.

The states in ACh correspond to the binding of a ligand to one of two binding sites on the receptor. In state , neither site is occupied; in states and , one site is occupied; and in states and , both sites are occupied.

Table II gives parameter values; the concentration of ACh, , is measured in mol/.

The same state-naming convention is used in the figure as with ChR2: states with an open ion channel are and ; states with a closed ion channel are , , and .

Example 3 (Calmodulin). The Calmodulin (CaM) receptor is a ligand-gated receptor. The CaM receptor consists of four binding sites, two on the C-terminus of the CaM protein and two on the N-terminus [24, 25, 26]. Each end of the protein can bind 0, 1, or 2 calcium ions, leading to nine possible states. For CaM, rather than an ion channel, it is important whether the or end of the receptor is completely bound (i.e., has both binding sites occupied by ligands). This property is represented by four symbols: if neither end is completely bound; if the end is completely bound; if the end is completely bound; and if both ends are completely bound.

State configuration and allowed transitions are depicted in Figure 3. The rate matrix is given in (5), with values given in Table III, and where the molar concentration of calcium is .

For each of the preceding examples, the rate constants depend on environmental conditions, and thus can be reported differently in different sources (see, e.g., [27] for different rate constants for ChR2).

Ii-C From the master equation to discrete-time Markov chains

The continuous-time master equation for the receptor dynamics (2) describes the evolution of a conditional probability where is the continuous time, discrete state càdlàg process giving the channel state, is the filtration generated by the input process , and is conditional expectation [28]. Establishing the appropriate ensemble of input processes and analyzing mutual information and capacity involve technical issues that do not shed light on the nature of biological signal transduction. Therefore we do not undertake a rigorous analysis of the continuous-time communications channels described by (2) in this paper. Rather, we introduce a discrete-time, discrete-state channel, motivated by the continuous-time channel, which can be rigorously analyzed, and study its properties both with a fixed timestep , and later in the limit . The discrete-time Markov chain model allows us to rely on capacity results for discrete-time Markov channels.

We obtain a discrete-time approximation to the master equation by writing

 dp(t)dt=p(t)Q=p(t+Δt)−p(t)Δt+o(Δt), as Δt→0, (6)

where we simplify the notation by writing as simply . Manipulating the middle and right expression in (6) gives

 p(t+Δt) =Δtp(t)Q+p(t)+o(Δt) (7) =Δtp(t)Q+p(t)I+o(Δt) (8) =p(t)(I+ΔtQ)+o(Δt), as Δt→0, (9)

where

is the identity matrix.

In order to arrive at a discrete-time model, we introduce the approximation satisfying

 pi=p(iΔt)+o(Δt), as Δt→0, (10)

and arrive at a discrete-time approximation to (6),

 pi+1=pi(I+ΔtQ). (11)

Thus, we have a discrete-time Markov chain with transition probability matrix

 P=I+ΔtQ. (12)

The matrix satisfies the conditions of a Markov chain transition probability matrix (nonnegative, row-stochastic) as long as is small enough. However, note that (and ) are dependent on , so the Markov chain is not generally time-homogeneous if is known (cf. (24)).

Iii Signal transduction as a communications system

In this section we give our main results, in which we describe and analyze signal transduction as a communication system. A brief roadmap to our results is given as follows: we first define the communication system in terms of input, output, and channel; we give the mutual information of the general discrete-time model under IID inputs (Theorem 1 and equation (39)); we take the continuous-time limit of the mutual information rate, showing that the expression for mutual information has a simple factorization (Theorem 2 and equation (81)); we give a physical interpretation of the factorization in (81); we give general conditions under which the Shannon capacity is satisfied by IID inputs (Theorem 3); and finally, we give an example calculation using ChR2 (Example 4).

Iii-a Communication model of receptors

We now discuss how the receptors can be described as information-theoretic communication systems: that is, in terms of input, output, and conditional input-output PMF.

Input: As discussed in Section II, the receptor is sensitive to given properties of the environment; previous examples included light intensity or ligand concentration. The receptor input is the value of this property at the surface of the receptor. The input is discretized in time: for integers , the input is ; we will write . We will also discretize the amplitude, so that for every , . We will assume that the are distinct and increasing; further, we assign the lowest and highest values special symbols:

 xL :=x1 (13) xH :=xk. (14)

In Section II, we gave the concentrations or intensities over a range of values (such as for ChR2). Thus, we select and as the minimum and maximum values of this range, respectively.

Output: In this paper, the output of the communication system is the receptor state number, given by the subscript of the state label: for example, if the state is , then . This is discretized to . The discrete channel inputs and outputs form vectors: in terms of notation, we write and .

Conditional input-output PMF: From (6)–(12), forms a Markov chain given , so

 p(y|x)=n∏i=1p(yi|xi,yi−1), (15)

where is given by the appropriate entry in the matrix , and where is null.111 Notation: (1) We will drop subscripts if it is unambiguous to do so, i.e., normally signifies ; (2) We say a variable is “null” if it vanishes under conditioning, i.e., if is null, then . The following diagram (16) indicates the conditional dependencies:

 X1X2X3X4X5⋯↓↓↓↓↓(Y0)⟶Y1⟶Y2⟶Y3⟶Y4⋯ (16)

As an example, consider ACh: suppose , , and . Then from (12) and Table II, we have . From (16) and the definition of , does not depend on ; that is, the channel’s input-output structure is time-invariant.

For a discrete-time Markov chain, the receptor states form a graph with vertex set and directed edges , with pair if , that is, for at least some input value there is a direct transition from to . Notice that, under this definition, self-transitions are included in , even though (for convenience) they are not depicted in the state-transition diagrams.

We say the transition from state to is insensitive to the input, or just insensitive, if, for all , we have (see Section II-B). Otherwise, the transition is sensitive. We let denote the subset of sensitive edges. (If state is the origin for a sensitive transition, i.e., there is at least one , then the self-transition is normally sensitive as well, but this condition is not required for our analysis.)

For a channel with inputs and outputs (both of length ) the mutual information gives the maximum information rate that may be transmitted reliably over the channel for a given input distribution. Mutual information is given by

 I(X;Y) =∑x,yp(x)p(y|x)logp(y|x)p(y), (17)

where is the conditional probability mass function (PMF) of .

As , generally as well; in this case, it is more useful to calculate the mutual information rate, which we introduce in the next section.

Iii-B Receptor IID capacity

Our focus in the remainder of this paper is on IID input distributions. Although IID inputs may not be realistic for chemical diffusion channels, such as for ligand-gated receptors (as concentration may persist for long periods of time), they can be capacity-achieving in these channels (see, e.g., [16]); moreover, IID input distributions may be physically realistic for light-gated channels.

Starting with (17), where and are both of fixed and finite length , the Shannon capacity is found by maximizing with respect to the input distribution , i.e.,

 C(n)=maxp(x)I(X;Y). (18)

where the limit is taken over all possible length- input distributions (not necessarily IID).

If the input is restricted to the set of IID input distributions, which is well defined for each (i.e., ), then is also well defined for each (see (17)). Furthermore, for each we have the IID capacity, written :

 Ciid(n)=maxp(xi)I(X;Y). (19)

where the maximum is taken over all possible settings of .

We can use (17) and (19) to obtain information rates per channel use. For a given IID input distribution , the IID mutual information rate is given by

 I(X;Y)=limn→∞1nI(X;Y). (20)

Furthermore, the maximum IID information rate is given by

 (21)

We derive these quantities in the remainer of the section, in which it will be clear that the limits in (20)—(21) exist. We start by deriving under IID inputs, and showing how it is calculated using quantities introduced in Section II. Finally, in Theorem 1, we give an expression for , and show that .

Recall from (15). Under IID inputs, it can be shown (see [18, 16]) that the receptor states form a time-homogeneous Markov chain, that is,

 p(y)=n∏i=1p(yi|yi−1), (22)

where is again null, and where

 p(yi|yi−1)=∑xip(yi|xi,yi−1)p(x). (23)

Furthermore, let represent the transition probability matrix of . Recall (12), in which was dependent on ; using (23), we can write

 ¯P=E[P]=I+ΔtE[Q], (24)

and since the sensitive terms in and are assumed to be linear in , we replace in these terms with to form and , respectively.

Using (15) and (22), (17) reduces to

 I(X;Y)=n∑i=1∑yi∑yi−1∑xip(yi,xi,yi−1)logp(yi|xi,yi−1)p(yi|yi−1). (25)

Recall that a transition may be sensitive () or insensitive (). For terms in (25), consider the insensitive transitions:

 p(yi,xi,yi−1)logp(yi|xi,yi−1)p(yi|yi−1) =p(yi,xi,yi−1)logp(yi|yi−1)p(yi|yi−1) (27) =p(yi,xi,yi−1)log1 (28) =0. (29)

where (27) follows since the transition is insensitive, and is not a function of ; cf. (23). Thus for IID inputs, the mutual information (25) is calculated using the sensitive transitions only, i.e., those transitions in . With this in mind, we can rewrite (25) as

 I(X;Y) =n∑i=1∑(yi−1,yi)∈S∑xip(yi,xi,yi−1)logp(yi|xi,yi−1)p(yi|yi−1) (31) =n∑i=1∑Aip(yi|xi,yi−1)p(yi−1)p(xi)logp(yi|xi,yi−1)p(yi|yi−1) (32)

where we let , i.e. the same terms as the sum in (32), for the sake of brevity. Also note that (32) follows from (31) because the input is IID.

Now consider the individual PMFs in (32), starting with . All transitions in are dependent on the input , and throughout this paper we assume that the sensitive transition rates depend linearly on the input signal intensity. Thus (recall (12)) for non-self-transitions (i.e., ),

 p(yi|xi,yi−1)=qyi−1yixiΔt. (33)

For self-transitions in (i.e., ) we have

 pYi|Xi,Yi−1(y|xi,y)= 1−⎛⎝∑y′≠y,(y′,y)∈Sqyy′xi−∑y′≠y,(y′,y)∉Sqyy′⎞⎠Δt, (35)

as seen in the diagonal entries of (12). Similarly, the terms can be obtained using (23)–(24); we replace in (33)–(35) with .

The terms represent the steady-state marginal probability that the receptor is in state ; for compact notation, let . If the input is IID, as we assume throughout this paper, then exists if the Markov chain is irreducible, aperiodic, and positive recurrent; these conditions hold for all the examples we consider (recall (22)–(24)).222 For clarity, although may be written with a time-indexing subscript, e.g. , this refers to the steady-state distribution of state , and does not imply that changes with time.

Define the partial entropy function

 ϕ(p)={0,p=0plogp,p≠0 (36)

and let

 H(p)=−ϕ(p)−ϕ(1−p) (37)

represent the binary entropy function. Then we have the following result.

Theorem 1

For an IID input distribution , the mutual information rate is given by

 I(X;Y) =∑(yi−1,yi)∈Sπyi−1(∑xi∈Xp(xi)ϕ(p(yi|xi,yi−1)) −ϕ⎛⎝∑xi∈Xp(xi)p(yi|xi,yi−1)⎞⎠). (39)

Furthermore, .

Proof: Divide the terms in (32) into the term, and all the remaining terms. Let represent the term, emphasizing its dependence on the IID input distribution , so that

 T1(p(xi)) =p(y1|x1,y0)p(y1)p(x1)logp(y1|x1,y0)p(y1|y0) (40) =p(y1|x1)p(y1)p(x1)logp(y1|x1)p(y1), (41)

where (41) follows since is null. Let represent the remaining terms, again dependent on but also on , so that

 T2(p(xi),n) =n∑i=2∑Aip(yi|xi,yi−1)p(yi−1)p(xi)logp(yi|xi,yi−1)p(yi|yi−1) (43) =(n−1)∑Aip(yi|xi,yi−1)p(yi−1)p(xi)logp(yi|xi,yi−1)p(yi|yi−1) (44)

recalling the definition of from the discussion after (32). Using (20),

 I(X;Y) =limn→∞T1(p(xi))n+limn→∞T2(p(xi),n)n (46) =∑Aip(yi|xi,yi−1)p(yi−1)p(xi)logp(yi|xi,yi−1)p(yi|yi−1), (47)

and (39) follows after some manipulation.

To show that , recall the definitions of and in (19) and (21), respectively. Referring to as for brevity,

 Ciid(n)=maxp(T1(p)+T2(p,n)). (48)

Let represent the IID input distribution maximizing the term , and let represent the IID input distribution maximizing the term . From (44), is independent of . Furthermore,

 T1(p2)n+T2(p2,n)n≤1nCiid(n)≤T1(p1)n+T2(p2,n)n. (49)

Taking the limit throughout (49) as , the terms vanish as they are constant with respect to . Comparing (44) and (47), also maximizes . The result follows.

Iii-C Limit of I(X;Y)/Δt as Δt→0

In this section we consider the continuous time limit of as , and give our second main result (Theorem 2): that in the continuous time limit, the mutual information rate is expressed simply as a product of the average flux through sensitive edges, and the relative entropy between the prior distribution on , and the posterior given a transition. While we do not claim to derive the mutual information rate of the continuous time channel, the continuous time limit of the discrete-time mutual information rate is a quantity of interest in its own right.

First, we show that the steady-state distribution is independent of :

Lemma 1

Suppose

is the normalized left eigenvector of

with eigenvalue 0 (see (

24)). Define the set so that if from (12) is a valid transition probability matrix for all . Then is the normalized left eigenvector of with eigenvalue 1, for all .

Proof: The proof is given in the appendix.

Note that contains all “sufficiently small” . It follows from the lemma that the steady state distribution is the same for both continuous and discrete time.

Note that the mutual information rate in (39) has units of nats per channel use, and that channel uses have duration . Moreover, the transition probabilities in (33)(35) are linear functions of . Substituting the discrete-time transition probabilities (12) into (39), the non-self-transition probabilities go to zero while the self-transition probabilities go to 1, so as . This should not be surprising: intuitively, as the time step shrinks, less information can be expressed per time step. However, dividing by (and obtaining ), the information rate per second is finite. It is then useful to consider how this rate behaves as .

Let represent the set of sensitive transitions excluding self transitions, i.e.,

 S′={(yi−1,yi):(yi−1,yi)∈S,yi−1≠yi}. (50)

Also let represent the components of excluding (i.e., only the sensitive self transitions).

For any edge define the limiting value of that edge’s contribution to the mutual information rate, as , as

 ι(y,y′)= limΔt→01Δtπy(∑x∈Xp(x)ϕ(p(y′|x,y)) −ϕ(∑x∈Xp(x)p(y′|x,y))) (52)

The limit calculation depends on whether . In case , we have (see (33)) and

 ∑xp(x)ϕ(p(y′|x,y))−ϕ(∑xp(x)p(y′|x,y)) =Δt{(∑xqxp(x))logq+(∑xqp(x)xlogx) −(∑xqxp(x))log(∑xqxp(x))} +o(Δt), as Δt→0+ (53) =qΔt(E(xlogx)−E(x)log(E(x)))+o(Δt), as Δt→0+ (54) =qΔt(Eϕ(x)−ϕ(Ex))+o(Δt), as Δt→0+. (55)

On the other hand, in the case when , , as . Therefore, these terms do not contribute to the mutual information.

Using these results, we can rewrite (39) as

 limΔt→0I(X;Y)Δt =∑(yi−1,yi)∈S′ι(yi−1,yi)+∑(yi−1,yi)∈S∖S′ι(yi−1,yi). (57)

Using (33)(35), we consider the two additive terms in (57) separately. For the first term (summing over ), we use l’Hôpital’s rule: in the denominator we have (trivially)

 ddΔtΔt =1, (58)

and from the numerator, we have

 limΔt→0ddΔt∑(yi−