DeepAI

# Upper Bounds on the Feedback Error Exponent of Channels With States and Memory

As a class of state-dependent channels, Markov channels have been long studied in information theory for characterizing the feedback capacity and error exponent. This paper studies a more general variant of such channels where the state evolves via a general stochastic process, not necessarily Markov or ergodic. The states are assumed to be unknown to the transmitter and the receiver, but the underlying probability distributions are known. For this setup, we derive an upper bound on the feedback error exponent and the feedback capacity with variable-length codes. The bounds are expressed in terms of the directed mutual information and directed relative entropy. The bounds on the error exponent are simplified to Burnashev's expression for discrete memoryless channels. Our method relies on tools from the theory of martingales to analyze a stochastic process defined based on the entropy of the message given the past channel's outputs.

• 13 publications
• 8 publications
• 19 publications
01/23/2018

### On The Reliability Function of Discrete Memoryless Multiple-Access Channel with Feedback

We derive a lower and upper bound on the reliability function of discret...
05/14/2022

### Error Exponents of the Dirty-Paper and Gel'fand-Pinsker Channels

We derive various error exponents for communication channels with random...
11/30/2021

### Sequential Transmission Over Binary Asymmetric Channels With Feedback

In this paper, we consider the problem of variable-length coding over th...
03/04/2019

### Error Exponents of Typical Random Trellis Codes

In continuation to an earlier work, where error exponents of typical ran...
05/12/2022

### Predictability Exponent of Stochastic Dynamical Systems

Predicting the trajectory of stochastic dynamical systems (SDSs) is an i...
02/02/2022

### On Joint Communication and Channel Discrimination

We consider a basic communication and sensing setup comprising a transmi...
05/30/2018

### A Geometric Property of Relative Entropy and the Universal Threshold Phenomenon for Binary-Input Channels with Noisy State Information at the Encoder

Tight lower and upper bounds on the ratio of relative entropies of two p...

## I Introduction

Communications over channels with feedback has been a longstanding problem in information theory literature. The early works on discrete memoryless channels pointed to negative answer as to whether feedback can increase the capacity [1]. Feedback, though, improves the channel’s error exponent — the maximum attainable exponential rate of decay of the error probability. The improvements are obtained using variable length codes, where the communication length depends on the channel’s relizations. In a seminal work, Burnashev [2] completely characterized the error exponent of DMC with noiseless and causal feedback. This characterization has a simple, yet intuitive, form:

 E(R)=C1(1−RC), (1)

where is the (average) rate of transmission, is the capacity of the channel, and is the maximum exponent for binary hypothesis testing over the channel. It is equal to the maximal relative entropy between conditional output distributions. The Burnashev’s exponent can significantly exceed the sphere-packing exponent, for no-feedback communications, as it approaches capacity with nonzero slope. The use of VLCs is shown to be essential to establish these resutls, as no improvements is gained using fixed-length codes [3, 4, 5].

This result led to the question as to whether the feedback improves capacity or error exponent of more general channels, modeling non-traditional communications involving memory and intersymbol interference (ISI). Among such models are channels with states where the transition probability of the channel varies depending on its state which itself evolves based on the past inputs and state realizations. Depending on the variants of this formulation, the agents may have no knowledge about the state (e.g. arbitrarily varying channels) or the may exactly know the state [6]. When state is known at the transmitter and the receiver, feedback can improve the error exponent. Particularly, Como, et al, [7] extended Burnashev-type exponent to finite-state ergodic Markov channels with known state and derived a similar form as in (1), under some ergodicity assumptions. The error exponent for channels with more general state evolution is still unknown. Only the feedback capacity of such channels when restricted to fixed-length codes is known [8].

This papers studies the feedback error exponent for channels with more general state evolution and allowing VLCs. More precisely, we study discrete channels with states where the state evolves as an arbitrary stochastic process (not necessarily ergodic or Markov) depending on the past realizations. Furthermore, the realization of the states are assumed to be unknown but the transmitter or the receiver may know the underlying probability distribution governing the evolution of the state. However, noiseless output is available at the transmitter with one unite of delay. The main contributions are two fold. First, we prove an upper bound on the error exponent of such channels which has the familiar form

 E(R)≤supN>0supPN∈PND(PN)(1−RI(PN)),

where is the directed relative entropy, is the directed mutual information, and is a collection of “feasible” probability distributions. As a special case, the bound simplifies to the Burnashev’s expression when the channel is DMC. Second, we introduce an upper bound on the feedback capacity of VLC for communications over these channels with stochastic states. This upper bound generalizes the results of Tatikonda and Mitter [8], and Purmuter et al.[9] where fixed-length codes are studied. Our approach relies on analysis of the entropy of the stochastic process defined based on entropy of the message given the past channel’s output. We analyze the drift of the entropy via tools from the theory of martingales.

Related works on the capacity and error exponent of channels with feedback are extensive. Starting with DMCs with fedback, Yamamoto and Itoh [10] introduced a two-phase iterative for achieving the Burnashev exponent. Also, error exponent of DMCs with feedback and cost constraints is studied in [11]. Also channels with state and feedback has been studied under various frameworks on the evolution model of the sates and whether they are known at the transmitter or the receiver. On one exterem of such models are arbitrarily varying channels [12]. The feedback capacity these channels for fixed-length codes is derived in [8]. Tchamkerten and Telatar [13] studied the universality of Burnashev error exponent. They considered communication setups where the parties have no exact knowledge of the statistics of the channel but know it belongs to a certain class of DMCs. The authors proved that no zero-rate coding scheme achieves the Burnashev’s exponent simultaneously for all the DMC’s in the class. However, they showed positive results for two families of such channels (e.g., binary symmetric and Z) [14]. Another class of channels with state are Markov channels that has been studied extensively for deriving their capacity [6, 15, 16] and error exponent using fixed-length codes [8]. A lower bound on the error exponent of unifilar channels is derived [17], where the states is a deterministic function of the previous ones. Other variants of this problem have been studied, including continuous-alphabet channels [18, 19], and multi-user channels [20, 21].

## Ii Problem Formulation and Definitions

The formal definitions are presented in this section. For short hand, we use to denote

A discrete channel with stochastic state has three finite sets , and representing the input, output, and state of the channel, respectively. Consider a collection of channels , indexed by , where each element is the transition probability of the channel at state . The states

, evolve according to a conditional probability distribution

depending on the past inputs and state realizations. As a result, after uses of the channel with being the channels input, state and output, the next output is given by

 P(st,yt|xt−1,st−1,yt−1)=Pt,S(st|st−1,xt−1)Q(yt|xt,st).

Such evolution of the states induces memory over the time as it depends on past inputs.

After each use of the channel, the output of the channel is available at the transmitter with one unit of delay. Moreover, we allow VLC for communications, where where both the transmitter and the receiver do not know the state of the channel. More precisely, the setup is defined as follows.

###### Definition 1.

An -VLC for communications over a channel with states and feedback is defined by

• A message

with uniform distribution over

.

• Encoding functions

• Decoding functions

• A stopping time with respect to (w.r.t) the filtration defined as the -algebra of for . Furthermore, it is assumed that is almost surely bounded as .

For technical reasons, we study a class of -VLCs for which the parameter grows sub-exponentially with that is for some fixed number . An example is the sequence -VLCs, where with being fixed parameters .

In what follows, for any -VLC, we define average rate, error probability, and error exponent. Given a message , the -th output of the transmitter is denoted by , where is the noiseless feedback upto time . Let

represent the estimate of the decoder about the message. Then, at the end of the stopping time

, the decoder declares as the decoded message. The average rate and (average) probability of error for a VLC are defined as

 R\ensurestackMath\stackon[1pt]=Δlog2ME[T],Pe\ensurestackMath\stackon[1pt]=ΔP{^WT≠W}.
###### Definition 2.

A rate is achievable for a given channel with stochastic states, if there exists a sequence of -VLCs such that

 limsupn→∞P(n)e=0,limsupn→∞logM(n)E[T(n)]≥R,

and , where is fixed. The feedback capacity, , is the convex closure of all achievable rates.

Naturally, the error exponent of a VLC with probability of error and stopping time is defined as . The following definition formalizes this notion.

###### Definition 3.

An error exponent function is said to be achievable for a given channel, if for any rate there exists a sequence of -VLCs such that

 liminfn→∞−logP(n)eE[T(n)] ≥E(R), limsupn→∞logM(n)E[T(n)] ≥R,

and with , where is fixed. The reliability function is the supremum of all achievable reliability functions .

## Iii Main Results

We start with deriving an upper bound on the feedback capacity of channels with stochastic states and allowing VLCs. The expressions are based on the directed information as introduced in [22] and defined as

 I(Xn→Yn)\ensurestackMath\stackon[1pt]=Δn∑i=1I(Xi;Yi|Yi−1). (2)

We further extend this notion to variable-length sequences. Consider a stochastic process and let be a (bounded) stopping time w.r.t an induced filtration . Then, the directed mutual information is defined as

 I(XT→YT)\ensurestackMath\stackon[1pt]=ΔE[T∑t=1I(Xt;Yt ∥Ft−1)]. (3)

Now, we are ready for an upper bound on the feedback capacity. For any integer , let be the set of all -letter distributions on that factor as

 N∏ℓ=1Pℓ,X(xℓ|xℓ−1,yℓ−1)Pℓ,S(sℓ|sℓ−1,xℓ−1)Q(yℓ|xℓ,sℓ). (4)

Next, we have the following result on the capacity with the proof in Appendix A.

###### Theorem 1.

The feedback capacity of a channel with stochastic states is bounded as

 CVLCF≤supN>0supPN∈PNsupT:T≤N1E[T]I(XT→YT),

where is a stopping time with respect to .

Observe that for a trivial stopping time , the bound simplifies to that for fixed-length codes as given in[8].

### Iii-a Upper Bound on the Error Exponent

We need a notation to proceed. Consider a pair of random sequences . Let be the MAP estimation of from observation , that is . Also, let which is the effective channel (averaged over possible states) from the transmitter’s perspective at time . With this notation, we define the directed KL-divergence as

 D(Xn→Yn) \ensurestackMath\stackon[1pt]=Δmaxxnn∑r=1DKL(¯Qr(⋅|X∗r,Yr−1)∥∥¯Qr(⋅|xr,Yr−1) ∣∣ Yr−1).

Intuitively, measures the sum of the expected “distance” between the channels probability distribution conditioned on the MAP symbol versus the worst symbol, across different times .

###### Theorem 2.

The error exponent of a channel with stochastic states is bounded as

 E(R)≤supN∈NsupPN∈PNsupT:T≤NsupT1:T1≤TD(PN)(1−RI(PN)),

where are stopping times, and

 I(PN) =1E[T1]I(XT1→YT1), D(PN) =1E[T−T1]D(XTT1+1→YTT1+1).

In the next section, we present our proof techniques.

## Iv Proof of Theorem 2

The proof follows by a careful study of the drift of the entropy of the message conditioned on the channel’s output at each time . Define the following random process:

 Ht =H(W|Ft),t>0, (5)

where is the -algebra of . We show that drifts in three phases: (i) linear drift (data phase) until reaching a small value (); (ii) fluctuation phase with values around ; and (iii) logarithmic drift (hypothesis testing phase) till the end. We derive bounds on the expected slope of the drifts and prove that the length of the fluctuation phase is asymptotically negligible as compared to the overall communication length ( Fig. 1).

More precisely, we have the following argument by defining a pruned time random process . First, for any and

define the following random variables

 τϵ \ensurestackMath\stackon[1pt]=Δinf{t>0:Ht≤ϵ}∧N (6) τϵ \ensurestackMath\stackon[1pt]=Δsup{t>0:Ht−1≥ϵ}∧N (7)

Then the pruned time process is defined as

 tn\ensurestackMath\stackon[1pt]=Δ⎧⎨⎩nif n<τϵn∨τϵif τϵ≤n≤NNif n>N (8)

Note that is a stopping time with respect to but this is not the case for .

###### Lemma 1.

Suppose a non-negative random process has the following properties w.r.t a filtration

 E[Hr+1−Hr|Fr] ≥−k1,r+1, if Hr ≥ϵ, (9a) E[logHr+1−logHr|Fr] ≥−k2,r+1 if Hr <ϵ (9b) |logHr+1−logHr| ≤k3 (9c) |Hr+1−Hr| ≤k4 (9d)

where are non-negative numbers and for all . Given , and , let

 Zt \ensurestackMath\stackon[1pt]=ΔHt−ϵI1{Ht≥ϵ}+(logHtϵD+f(logHtϵ))1{Ht<ϵ},

where with . Further define as

 St \ensurestackMath\stackon[1pt]=Δt∧τϵ∑r=1k1,rI+t∧τϵ∑r=t∧τϵ+1k4I1{Hr−1≥√ϵ}+t∑r=t∧τϵ+1k2,rD+√ϵNI1{t≥τϵ}.

Let be as in (8) but w.r.t . Lastly define the random process as Then, for small enough the process is a sub-martingale with respect to the time pruned filtration .

###### Proof:

The objective is to prove almost surely for all and . We prove the lemma by considering three cases depending on .

Case (a). : From the definition of in (8), in this case and Also, as the time did not reach , then and . Therefore, in this case, the random process of interest equals to

 Ln =Ztn+Stn=Zn+Sn=Hn−ϵI+n∑r=1k1,rI Ln+1 =Ztn+1+Stn+1=Zn+1+Sn+1 =Hn+1−ϵI+n+1∑r=1k1,rI. (10)

As a result, the difference between and satisfies the following

 E[(Ln+1−Ln) 1{n<τϵ−1}|ytn] =E[(Ln+1−Ln)1{n<τϵ−1}|yn] =E[Ln+1−Ln|yn]1{n<τϵ−1},

where the first equality holds as and the second equality holds as is a stopping time which implies that is a function of . Next, from (10), the difference term above is bounded as

 E[Ln+1−Ln|yn] =E[Hn+1−HnI+k1,n+1I|yn] =E[Hn+1−Hn|yn]I+k1,n+1I≥0,

where the last inequality follows from (9a). As a result, we proved that .

Case (b). : In this case, implying that and . Furthermore, since, , then . Consequently, the random process equals to

 Ln =Zn+Sn=Hn−ϵI+n∑r=1k1,rI Ln+1 =Zτϵ+Sτϵ=(Hτϵ−ϵI)1{Hτϵ≥ϵ} +(logHτϵ−logϵD+f(logHτϵϵ))1{Hτϵ<ϵ} +τϵ∑r=1k1,rI+τϵ∑r=τϵ+1k4I1{Hr−1≥√ϵ}+√ϵNI.

Note that does not necessarily equal to the logarithmic part. The reason is that is pruned by as in (7). Thus, can be greater than when . We proceed by bounding . Note that, for small enough the following inequality holds

 ϵI(ey−1)−yD

Applying inequality (11) with , we can write that

 Zτϵ >(Hτϵ−ϵI)1{Hτϵ≥ϵ}+(Hτϵ−ϵI)1{Hτϵ<ϵ} =Hτϵ−ϵI (12)

Consequently, the difference satisfies the following

 E[ (Ln+1−Ln)1{n=τϵ−1}|ytn] =E[Ln+1−Ln|yn]1{n=τϵ−1} ≥E[Hτϵ−HnI+k1,τϵI+τϵ∑r=τϵ+1k4I1{Hr−1≥√ϵ}+√ϵNI∣∣yn]1{n=τϵ−1} (13)

Next, we bound the first term above as

 Hτϵ−Hn =Hn+1−Hn+τϵ∑r=n+2(Hr−Hr−1),

where in the first equality, we add and subtract the intermediate terms . Next,we substitute the above terms in the right-hand side of (13). As , then we obtain that

 (???) ≥E[τϵ∑r=τϵ+1(Hr−Hr−1I+k4I1{Hr−1≥√ϵ})+√ϵNI∣∣yn]1{n=τϵ−1}, (14)

where the inequality holds from (9a) and the fact that . Next, by factoring and the indicator function inside the expectation, we have the following chain of inequalities

 (???) =1IE[τϵ∑r=τϵ+1((Hr−Hr−1)+k4)1{Hr−1≥√ϵ}+((Hr−Hr−1)1{Hr−1<√ϵ})+√ϵN∣∣yn]1{n=τϵ−1} (a)≥1IE[τϵ∑r=τϵ+1((Hr−Hr−1)1{Hr−1<√ϵ})+√ϵN∣∣yn]1{n=τϵ−1} (b)≥1IE[(τϵ∑r=τϵ+1−Hr−11{Hr−1<√ϵ})+√ϵN∣∣yn]1{n=τϵ−1} (d)≥1IE[(N∑r=1−√ϵ)+√ϵN∣∣yn]1{n=τϵ−1} ≥0,

where (a) is due to (9d), inequality (b) holds as , inequality (c) holds as , and lastly (d) holds as . To sum up, we proved that

 E[(Ln+1−Ln)1{n=τϵ−1}|ytn]≥0.

Case (c). : This is the last case. Note that if , then . Thus, immediately, almost surely. Otherwise, if and or if , then and hence . Therefore, it remains to consider the case that and . Therefore, and . Furthermore, as and , then and , implying that we are in the logarithmic drift. Therefore, we have that

 Ln=Zn =logHn−ϵD+f(logHnϵ)+Sn Ln+1=Zn+1 =logHn+1−ϵD+f(logHn+1ϵ)+Sn+1.

Hence, to sum up the above sub-cases, we conclude that when , then

 Ln+1−Ln =logHtn+1−logHtnD+f(logHtn+1ϵ)−f(logHtnϵ)+Stn+1−Stn.

Note that from (9b), the following inequality holds

 E[logHtn+1−logHtnD+Stn+1−Stn∣∣ytn]≥0.

Therefore, the difference satisfies the following

 E[(Ln+1 −Ln)1{n≥τϵ}|ytn] =E[(Ln+1−Ln)|ytn]1{n≥τϵ} ≥E[f(logHtn+1ϵ)−f(logHtnϵ)∣∣ytn]1{n≥τϵ}.

Next, we provide an argument similar to Point-to-Point (PtP) case. That is, we use the Taylor’s theorem for . We only need to consider the case that and implying that and . Using the Taylor’s theorem we can write

 f(logHn+1ϵ)=f(logHnϵ)+∂f∂y∣∣y=logHnϵ(logHn+1−logHn)+∂2f∂y2∣∣y=ζ(logHn+1Hn)2,

where is between and and

 ∂f∂y∣∣y=logHnϵ=−eλlogHnϵI,∂2f∂y2∣∣y=ζ=−λIeλζ.

As a result, we have that

 E[f(logHn+1ϵ)−f(logHnϵ)∣∣yn] =E⎡⎣−eλlogHnϵI(logHn+1Hn)−λIeλζ(logHn+1Hn)2∣∣yn⎤⎦ =E⎡⎣−eλlogHnϵI(logHn+1Hn)−λIeλ(ζ±logHnϵ)(logHn+1Hn)2∣∣yn⎤⎦ (a)≥E⎡⎣−eλlogHnϵI(logHn+1Hn)−λIeλ(k3+logHnϵ)(logHn+1Hn)2∣∣yn⎤⎦ ≥E⎡⎣−eλlogHnϵI(logHn+1Hn)−λk23Ieλ(k3+logHnϵ)∣∣yn⎤⎦ =−eλlogHnϵIE[logHn+1Hn|yn]−λk23Ieλ(k3+logHnϵ) =−eλlogHnϵIE[logHn+1Hn|yn]≤k3−λk23eλk3IeλlogHnϵ ≥eλlogHnϵIk3−λk23eλk3IeλlogHnϵ =(k3I−λk23eλk3I)eλlogHnϵ ≥0,

where inequality (a) holds as . The last inequality holds for sufficiently small .

Lastly, combining all cases from (a) to (c), we prove that which completes the proof. ∎

Now, we show that as in (5) has the conditions in Lemma 1. First (9a) holds because of the following lemma.

###### Lemma 2.

Given any -VLC, the following inequality holds almost surely for

 E[Hr−Hr−1|Fr−1] =−Jr, (15)

where with the induced .

###### Proof:

For any , we have that

 E[Hr− Hr−1|yr−1]=H(W|Yr,yr−1)−H(W|yr−1) =−I(W;Yr|yr−1) =−I(W,Xr;Yr|yr−1) =−H(Yr|yr−1)+