I Introduction
Living cells take in information from their surroundings through myriad signal transduction processes. Signal transduction takes many forms: the input signal can be carried by changes in chemical concentration, electrical potential, light intensity, mechanical forces, and temperature, inter alia. In many instances these extracellular stimuli trigger intracellular responses that can be represented as transitions among a discrete set of states [1]. Models of these processes are of great interest to mathematical and theoretical biologists [2].
The “transduction” of the signal occurs through the physical effect of the input signal on the transition rates among the various states describing the receptor. An early mathematical model of this type was the voltagesensitive transitions among several open and closed ion channel states in Hodgkin and Huxley’s model for the conduction of sodium and potassium ions through the membranes of electrically excitable cells [3]. Presently, many such models are known for signal transduction systems, such as: the detection of calcium concentration signals by the calmodulin protein [4], binding of the acetylcholine (ACh) neurotransmitter to its receptor protein [5], and modulation of the channel opening transition by light intensity in the channelrhodopsin (ChR) protein [6]. In each of these examples the channel may be modeled as a weighted, directed graph, in which the vertices represent the discrete channel states, and the weighted edges represent per capita transition rates, some of which can be modulated by the input signals.
Mutual information, and Shannon capacity, arise in a variety of biological contexts. For example, mutual information may predict the differential growth rates of organisms learning about their environment [7], based on the Kelly criterion [8]. For biological communication systems, achieving a distortion criterion (expressed as mutual information) need not require complicated signal processing techniques; see [9, Example 2]. Moreover, the free energy cost of molecular communication (such as in signal transduction) has a mathematical form similar to mutual information [10], leading to thermodynamic bounds on capacity per unit energy cost (cf. [11]).
Stochastic modeling of signal transduction as a communication channel has considered the chemical reactions in terms of Markov chains [12] and in terms of the “noise” inherent in the binding process [13]
. For simplified twostate Markov models, Shannon capacity of signal transduction has been calculated for slowly changing inputs
[14] and for populations of communicating bacteria [15]. Our own previous work has investigated the capacity of signal transduction: in [16], we obtained the Shannon capacity of twostate Markov signal transduction under arbitrary inputs, and showed that the capacity for multiple independent receptors has the same form [17]. Related channel models have been studied in the informationtheoretic literature, such as the unit output memory channel [18], the “previous output is the state” (POST) channel [19, 20]; capacity results for some channels in these classes were recently given in [21].The present paper focuses on the mutual information and capacity of finitestate signal transduction channels. Generalizing previous results, we provide discretetime, finitestate channel models for a wide class of signal transduction receptors, giving Channelrhodopsin2 (ChR2), Acetylcholine (ACh), and Calmodulin (CaM) as specific examples. We also provide an explicit formula for the mutual information of this class of models under independent, identically distributed (IID) inputs (Theorem 1). Subsequently, we consider the continuous time limit as the interval between the discretetime instants goes to zero, and find a simple closedform expression for the mutual information (Theorem 2), with a natural physical interpretation. We further give conditions under which our formula gives the Shannon capacity of the channel, namely that there is exactly one transition in the Markov chain that is sensitive to the channel input (Theorem 3), and we use this result to (numerically) find the Shannon capacity of ChR.
The remainder of the paper is organized as follows: in Section II, we give a generalized model for discretetime, finitestate signal transduction systems; in Section III, we discuss signal transduction as a communication system, deriving expressions for the mutual information and giving our main results; and in Section IV, we discuss the biological significance of the results, as well as the limitations of our analysis.
Ii Model
Iia Physical model
Signal transduction encompasses a wide variety of physical processes. For example, in a ligandgated system, signals are transmitted using concentrations of signaling molecules, known as ligands, which bind to receptor proteins. As another example, in a lightgated system, signals are transmitted using light, where the receptor absorbs photons. Other possibilities exist, such as voltagegated ion channels. The receptor, often located on the surface of the cell, forms the receiver in the signal transduction system, and conveys (or transduces) the signal across the cell membrane; the receptor is the focus of our analysis.
Signal transduction receptors share a mathematical model: they can be viewed as finitestate, intensitymodulated Markov chains, in which the transition rates between certain pairs of states are sensitive to the input (though other transitions may be independent of the input). Our main examples in this paper focus on ligand and lightgated receptors. For example, in a ligandgated system, the binding of the ligand results in a change in the receptor, which then produces second messengers (normally a different species than the ligand) to convey the message to the cell interior. In a lightgated system, the incident photon causes a similar change in the receptor, which may open to allow an ion current to pass to the interior of the cell. In either case, there may be a relaxation process which returns the receptor to the “ready” state, and this process may be independent of the signal; or other processes that are either sensitive to or independent of the signal, depending on the purpose of the receptor.
In the next two sections, we describe the Markov chain model for receptors, both in continuous and in discrete time. Although we focus on ligand and lightgated receptors, we emphasize that our framework is general enough to include other kinds of receptors.
IiB Continuous time: Master equation kinetics
Receptors are finitestate Markov chains. For a receptor with discrete states, there exists a
dimensional vector of state occupancy probabilities
, given by(1) 
where represents the probability of a given receptor occupying state at time . The environmental conditions at the receptor, such as light level or ligand concentration, are known as the input .
The chemical kinetics of the receptor are captured by a differential equation known as the master equation [22]. Let represent a matrix of per capita transition rates, where represents the instantaneous rate at which receptors starting in state enter state . It is helpful to visualize the matrix using a graph:

There are vertices, representing the states; and

A directed edge is drawn from vertex to if and only if for some .
Changing from one state to another is called a transition, so the graph corresponding to depicts the possible transitions. A transition may be sensitive, i.e. varies as a function of the input , or insensitive, is constant with respect to .
Using , the master equation is given by
(2) 
We use the notation from [23]:

States take a compound label, consisting of a state property and a state number. The state number is unique to each state, but the state property may be shared by multiple states. For example, in each state the receptor’s ion channel might be either open or closed ; the state label means that in state 1 the channel is closed, and means that in state 2 the channel is open. In this paper we use the state number rather than the state property. (Since we show that the state numbers form a Markov chain, in general the state properties form a hidden Markov chain; we discuss this further in Section IV.)

We assume that rates which are sensitive to the input are directly proportional to the input . For example, is the transition rate from , which is sensitive, while is the transition rate from , which is insensitive.

The th diagonal element of is written , and is set so that the th row sums to zero (so, if appears in the th row, may depend on ).
Taking sensitive rates to be proportional to the signal is a key modeling assumption; it is satisfied for the examples we consider, but there exist systems in which the signal acts nonlinearly on the rate.
The following three examples illustrate the use of our notation, and give practical examples of receptors along with their transition graphs and rate constants.
Example 1 (Channelrhodopsin2). The Channelrhodopsin2 (ChR2) receptor is a lightgated ion channel. The receptor has three states, named Closed (), Open (), and Desensitized (). The channelopen () state is the only state in which the ion channel is open, passing an ion current. The channelclosed () states, and , are distinct in that the receptor is lightsensitive in state , and insensitive in state [6]. The rate matrix for ChR2 is
(3) 
where is the relative intensity. To keep the row sums equal to zero, we set , , and . Fig. 1 shows state labels and allowed state transitions.
Parameter values from the literature are given in Table I.
Parameter  from [6]  Units 

s  
50  s  
17  s 
Example 2 (Acetylcholine). The Acetylcholine (ACh) receptor is a ligandgated ion channel. Following [5], we model the receptor as a conditional Markov process on five states, with rate matrix
(4) 
There are three sensitive transitions: , , and , which are proportional to ligand concentration . For the purposes of our analysis, we use a range of . Fig. 2 shows the allowed state transitions.
The states in ACh correspond to the binding of a ligand to one of two binding sites on the receptor. In state , neither site is occupied; in states and , one site is occupied; and in states and , both sites are occupied.
Table II gives parameter values; the concentration of ACh, , is measured in mol/.
The same statenaming convention is used in the figure as with ChR2: states with an open ion channel are and ; states with a closed ion channel are , , and .
Parameter  Name in [5]  Value/range  Units 

s  
s  
s  
s  
s  
s  
15  s  
s  
s  
s 
Example 3 (Calmodulin). The Calmodulin (CaM) receptor is a ligandgated receptor. The CaM receptor consists of four binding sites, two on the Cterminus of the CaM protein and two on the Nterminus [24, 25, 26]. Each end of the protein can bind 0, 1, or 2 calcium ions, leading to nine possible states. For CaM, rather than an ion channel, it is important whether the or end of the receptor is completely bound (i.e., has both binding sites occupied by ligands). This property is represented by four symbols: if neither end is completely bound; if the end is completely bound; if the end is completely bound; and if both ends are completely bound.
(5) 
State configuration and allowed transitions are depicted in Figure 3. The rate matrix is given in (5), with values given in Table III, and where the molar concentration of calcium is .
Parameter  Name in [4]  Value/range  Units 

, ,  s  
, ,  s  
, ,  s  
, ,  s  
, ,  s  
, ,  s  
, ,  s  
, ,  6.5  s 
For each of the preceding examples, the rate constants depend on environmental conditions, and thus can be reported differently in different sources (see, e.g., [27] for different rate constants for ChR2).
IiC From the master equation to discretetime Markov chains
The continuoustime master equation for the receptor dynamics (2) describes the evolution of a conditional probability where is the continuous time, discrete state càdlàg process giving the channel state, is the filtration generated by the input process , and is conditional expectation [28]. Establishing the appropriate ensemble of input processes and analyzing mutual information and capacity involve technical issues that do not shed light on the nature of biological signal transduction. Therefore we do not undertake a rigorous analysis of the continuoustime communications channels described by (2) in this paper. Rather, we introduce a discretetime, discretestate channel, motivated by the continuoustime channel, which can be rigorously analyzed, and study its properties both with a fixed timestep , and later in the limit . The discretetime Markov chain model allows us to rely on capacity results for discretetime Markov channels.
We obtain a discretetime approximation to the master equation by writing
(6) 
where we simplify the notation by writing as simply . Manipulating the middle and right expression in (6) gives
(7)  
(8)  
(9) 
where
is the identity matrix.
In order to arrive at a discretetime model, we introduce the approximation satisfying(10) 
and arrive at a discretetime approximation to (6),
(11) 
Thus, we have a discretetime Markov chain with transition probability matrix
(12) 
The matrix satisfies the conditions of a Markov chain transition probability matrix (nonnegative, rowstochastic) as long as is small enough. However, note that (and ) are dependent on , so the Markov chain is not generally timehomogeneous if is known (cf. (24)).
Iii Signal transduction as a communications system
In this section we give our main results, in which we describe and analyze signal transduction as a communication system. A brief roadmap to our results is given as follows: we first define the communication system in terms of input, output, and channel; we give the mutual information of the general discretetime model under IID inputs (Theorem 1 and equation (39)); we take the continuoustime limit of the mutual information rate, showing that the expression for mutual information has a simple factorization (Theorem 2 and equation (81)); we give a physical interpretation of the factorization in (81); we give general conditions under which the Shannon capacity is satisfied by IID inputs (Theorem 3); and finally, we give an example calculation using ChR2 (Example 4).
Iiia Communication model of receptors
We now discuss how the receptors can be described as informationtheoretic communication systems: that is, in terms of input, output, and conditional inputoutput PMF.
Input: As discussed in Section II, the receptor is sensitive to given properties of the environment; previous examples included light intensity or ligand concentration. The receptor input is the value of this property at the surface of the receptor. The input is discretized in time: for integers , the input is ; we will write . We will also discretize the amplitude, so that for every , . We will assume that the are distinct and increasing; further, we assign the lowest and highest values special symbols:
(13)  
(14) 
In Section II, we gave the concentrations or intensities over a range of values (such as for ChR2). Thus, we select and as the minimum and maximum values of this range, respectively.
Output: In this paper, the output of the communication system is the receptor state number, given by the subscript of the state label: for example, if the state is , then . This is discretized to . The discrete channel inputs and outputs form vectors: in terms of notation, we write and .
Conditional inputoutput PMF: From (6)–(12), forms a Markov chain given , so
(15) 
where is given by the appropriate entry in the matrix , and where is null.^{1}^{1}1 Notation: (1) We will drop subscripts if it is unambiguous to do so, i.e., normally signifies ; (2) We say a variable is “null” if it vanishes under conditioning, i.e., if is null, then . The following diagram (16) indicates the conditional dependencies:
(16) 
As an example, consider ACh: suppose , , and . Then from (12) and Table II, we have . From (16) and the definition of , does not depend on ; that is, the channel’s inputoutput structure is timeinvariant.
For a discretetime Markov chain, the receptor states form a graph with vertex set and directed edges , with pair if , that is, for at least some input value there is a direct transition from to . Notice that, under this definition, selftransitions are included in , even though (for convenience) they are not depicted in the statetransition diagrams.
We say the transition from state to is insensitive to the input, or just insensitive, if, for all , we have (see Section IIB). Otherwise, the transition is sensitive. We let denote the subset of sensitive edges. (If state is the origin for a sensitive transition, i.e., there is at least one , then the selftransition is normally sensitive as well, but this condition is not required for our analysis.)
For a channel with inputs and outputs (both of length ) the mutual information gives the maximum information rate that may be transmitted reliably over the channel for a given input distribution. Mutual information is given by
(17) 
where is the conditional probability mass function (PMF) of .
As , generally as well; in this case, it is more useful to calculate the mutual information rate, which we introduce in the next section.
IiiB Receptor IID capacity
Our focus in the remainder of this paper is on IID input distributions. Although IID inputs may not be realistic for chemical diffusion channels, such as for ligandgated receptors (as concentration may persist for long periods of time), they can be capacityachieving in these channels (see, e.g., [16]); moreover, IID input distributions may be physically realistic for lightgated channels.
Starting with (17), where and are both of fixed and finite length , the Shannon capacity is found by maximizing with respect to the input distribution , i.e.,
(18) 
where the limit is taken over all possible length input distributions (not necessarily IID).
If the input is restricted to the set of IID input distributions, which is well defined for each (i.e., ), then is also well defined for each (see (17)). Furthermore, for each we have the IID capacity, written :
(19) 
where the maximum is taken over all possible settings of .
We can use (17) and (19) to obtain information rates per channel use. For a given IID input distribution , the IID mutual information rate is given by
(20) 
Furthermore, the maximum IID information rate is given by
(21) 
We derive these quantities in the remainer of the section, in which it will be clear that the limits in (20)—(21) exist. We start by deriving under IID inputs, and showing how it is calculated using quantities introduced in Section II. Finally, in Theorem 1, we give an expression for , and show that .
Recall from (15). Under IID inputs, it can be shown (see [18, 16]) that the receptor states form a timehomogeneous Markov chain, that is,
(22) 
where is again null, and where
(23) 
Furthermore, let represent the transition probability matrix of . Recall (12), in which was dependent on ; using (23), we can write
(24) 
and since the sensitive terms in and are assumed to be linear in , we replace in these terms with to form and , respectively.
Recall that a transition may be sensitive () or insensitive (). For terms in (25), consider the insensitive transitions:
(27)  
(28)  
(29) 
where (27) follows since the transition is insensitive, and is not a function of ; cf. (23). Thus for IID inputs, the mutual information (25) is calculated using the sensitive transitions only, i.e., those transitions in . With this in mind, we can rewrite (25) as
(31)  
(32) 
where we let , i.e. the same terms as the sum in (32), for the sake of brevity. Also note that (32) follows from (31) because the input is IID.
Now consider the individual PMFs in (32), starting with . All transitions in are dependent on the input , and throughout this paper we assume that the sensitive transition rates depend linearly on the input signal intensity. Thus (recall (12)) for nonselftransitions (i.e., ),
(33) 
For selftransitions in (i.e., ) we have
(35) 
as seen in the diagonal entries of (12). Similarly, the terms can be obtained using (23)–(24); we replace in (33)–(35) with .
The terms represent the steadystate marginal probability that the receptor is in state ; for compact notation, let . If the input is IID, as we assume throughout this paper, then exists if the Markov chain is irreducible, aperiodic, and positive recurrent; these conditions hold for all the examples we consider (recall (22)–(24)).^{2}^{2}2 For clarity, although may be written with a timeindexing subscript, e.g. , this refers to the steadystate distribution of state , and does not imply that changes with time.
Define the partial entropy function
(36) 
and let
(37) 
represent the binary entropy function. Then we have the following result.
Theorem 1
For an IID input distribution , the mutual information rate is given by
(39) 
Furthermore, .
Proof: Divide the terms in (32) into the term, and all the remaining terms. Let represent the term, emphasizing its dependence on the IID input distribution , so that
(40)  
(41) 
where (41) follows since is null. Let represent the remaining terms, again dependent on but also on , so that
(43)  
(44) 
recalling the definition of from the discussion after (32). Using (20),
(46)  
(47) 
and (39) follows after some manipulation.
To show that , recall the definitions of and in (19) and (21), respectively. Referring to as for brevity,
(48) 
Let represent the IID input distribution maximizing the term , and let represent the IID input distribution maximizing the term . From (44), is independent of . Furthermore,
(49) 
Taking the limit throughout (49) as , the terms vanish as they are constant with respect to . Comparing (44) and (47), also maximizes . The result follows.
IiiC Limit of as
In this section we consider the continuous time limit of as , and give our second main result (Theorem 2): that in the continuous time limit, the mutual information rate is expressed simply as a product of the average flux through sensitive edges, and the relative entropy between the prior distribution on , and the posterior given a transition. While we do not claim to derive the mutual information rate of the continuous time channel, the continuous time limit of the discretetime mutual information rate is a quantity of interest in its own right.
First, we show that the steadystate distribution is independent of :
Lemma 1
Suppose
is the normalized left eigenvector of
with eigenvalue 0 (see (
24)). Define the set so that if from (12) is a valid transition probability matrix for all . Then is the normalized left eigenvector of with eigenvalue 1, for all .Proof: The proof is given in the appendix.
Note that contains all “sufficiently small” . It follows from the lemma that the steady state distribution is the same for both continuous and discrete time.
Note that the mutual information rate in (39) has units of nats per channel use, and that channel uses have duration . Moreover, the transition probabilities in (33) –(35) are linear functions of . Substituting the discretetime transition probabilities (12) into (39), the nonselftransition probabilities go to zero while the selftransition probabilities go to 1, so as . This should not be surprising: intuitively, as the time step shrinks, less information can be expressed per time step. However, dividing by (and obtaining ), the information rate per second is finite. It is then useful to consider how this rate behaves as .
Let represent the set of sensitive transitions excluding self transitions, i.e.,
(50) 
Also let represent the components of excluding (i.e., only the sensitive self transitions).
For any edge define the limiting value of that edge’s contribution to the mutual information rate, as , as
(52) 
The limit calculation depends on whether . In case , we have (see (33)) and
(53)  
(54)  
(55) 
On the other hand, in the case when , , as . Therefore, these terms do not contribute to the mutual information.
Using these results, we can rewrite (39) as
(57) 
Comments
There are no comments yet.