## I Introduction

Communication is ubiquitous in our lives, be it wireless devices such as cellphones, IoT devices, satellite communication, or wired communication such as Ethernet and more. The problem of communicating a message reliably and efficiently with least delay and energy is a fundamental problem which has been attempted to be addressed from many different tools in engineering. The fundamentals of digital communication was laid down by Shannon in his pioneering work in [1]. Since then there has been significant effort on finding coding schemes that minimize probability of error and achieve capacity. There has been a lot of focus on a point to point discrete memoryless channel for which efficient codes such as Turbo codes, LDPC codes etc have been formulated. For a point to point channel with feedback, it is known that the feedback doesn’t increase the capacity [2] but it can significantly improve the error exponents from exponential to double exponential [3].

There have been multiple works proposing transmission schemes for several instances of channel with *noiseless* feedback such as Horstein’s scheme [4] for binary symmetric channel (BSC), and Schalkwijk and Kailath [3] for an additive white Gaussian noise (AWGN) channel, all of which were generalized by a posterior matching scheme (PMS) [5] for an arbitrary channel. However, it is known that these schemes perform rather poorly when the feedback is even slightly noisy [6]. The problem of finding optimum transmission schemes for noisy feedback has been an important open problem. In this problem, both the sender and the receiver receive different observations whose domain increase exponentially in time, and the set of possible strategies grow double exponential in time. Because of asymmetry of information and lack of any common information, there is no known (dynamic programming like) methodology that decomposes this problem in time reducing the complexity to linear in time. Despite the lack of a proper mathematical treatment, recently it was shown in [7]

that a scheme using RNN (recurrent neural networks) improve the current best known scheme by three orders of magnitude.

In this paper, we present a sequential decomposition framework that provides a concept of state and allows to decompose this problem across time, to find optimal Markovian policies (w.r.t. that state, that are not necessarily globally optimum). By doing so it provides a framework to reduce the time complexity from exponential to linear. To the best of our knowledge, this is the first instance of decentralized stochastic control problem without any common information, that allows sequential decomposition.

We consider policies of the sender such that current transmission is a function of the sender’s message and a controlled Markov process that sender updates on observing the feedback. The receiver maintains a belief on the message and the controlled Markov process of the sender to decode the message. This belief is updated using the sender’s policy function and it does ML decoding on this belief at the last stage to obtain an estimate of the message. Now the receiver’s role is absent from the problem formulation.

Equivalently, there is only the sender who observes the message and the noisy feedback, based on which it maintains a controlled Markov process. It has a cost function that is a function of the receiver’s belief, which it doesn’t observe perfectly. So the sender puts a belief on this state conditioned on its information, which is now a state of the system that the sender perfectly observes. This state is controlled by sender’s policy function at time . Based on this, we formulate a dynamic programming in sender’s belief as state and its policy function as its action.

In the following, we denote random variables with capital letters

, their realizations with small letters , and alphabets with calligraphic letters . A sequence is denoted with . We use the notation to denote. The space of probability distributions (or equivalently probability mass functions) on the finite alphabet

is denoted by .## Ii Channel Model

We consider a point to point (PTP) discrete memoryless channel (DMC) with noisy feedback. The input symbols , take values in the finite alphabets , and , respectively.

Consider the problem of transmission of messages , over the PTP DMC with noisy feedback using fixed length codes of length . Encoder generates its channel inputs based on its private message and noisy feedback . Thus

(1) |

The decoder estimates the messages based on channel outputs, as

(2) |

The channel is memoryless in the sense that the current channel output is independent of all the past channel inputs and the channel outputs, i.e.,

(3) |

Finally, after each transmission, the sender receives a noisy feedback of the transmission as

(4) |

A fixed-length transmission scheme for the channels is the pair , consisting of the encoding functions and decoding function . The error probability associated with the transmission scheme is defined as

(5) |

## Iii Decentralized control of PTP DMC with noisy feedback

One may pose the following optimization problem. Given the alphabets , , , the channels , the pair , and for a fixed length , design the optimal transmission scheme that minimizes the error probability .

(P1) |

For any pair of encoding functions, the optimal decoder is the ML decoder (assuming equally likely hypotheses), denoted by . Thus we have reformulated problem (P1) as

(P2) |

where we have defined with a slight abuse of notation based on the above equivalence between encoding functions and mappings , as well as the use of ML decoding.

In this following, we will provide a sequential decomposition methodology to find optimal policies within the class of policies that satisfy , where is a controlled Markov Process such that, for any given function ,

(6) |

Let

(7) | ||||

(8) | ||||

(9) |

Then we can easily show that

(10) | ||||

(11) | ||||

(12) |

where

(13) | ||||

(14) |

We also assume that is defined such that there is a one-to-one correspondence between and so that each leads to one optimal action corresponding to .

The ML decoder can now be expressed based on as

(15) |

and the resulting error probability is

(16) |

where we defined the terminal cost function as

(17) |

and the expectation is w.r.t. the random variable . We define .

We now show that can be updated using Bayes rule in a policy-independent way as

(18a) | ||||

(18b) | ||||

(18c) | ||||

(18d) | ||||

Thus | ||||

(18e) |

(19a) | ||||

(19b) | ||||

(19c) | ||||

Thus | ||||

(19d) |

(20a) | ||||

(20b) | ||||

(20c) | ||||

Thus | ||||

(20d) |

We summarize the above result into the following lemma.

###### Lemma 1

The posterior belief on the message , on and on can be updated in a policy-independent (i.e., -independent) way as .

Based on the above, we present a dynamic program for the sender as follows.

For all ,

(21a) | ||||

(21b) | ||||

(21c) |

All the above results can be summarized in the following theorem

###### Theorem 1

For the optimization problem (P1), one can find optimal Markovian policies of the kind with state at time , ; action ; zero instantaneous costs for ; and terminal cost given in (17). Consequently, the optimal encoders are of the form , where can be found through backward dynamic programming as in (21).

###### Proof:

Please see Appendix A.

### Iii-a Conjecture

In this section, we conjecture that there exists a PMS [5] like scheme such that

(22) | ||||

(23) |

that achieves capacity, where is the capacity achieving distribution.

We note that in case of noiseless feedback i.e. when w.p. 1, then where is the actual belief of the receiver, i.e. sender perfectly observes the receiver’s belief and the above conjectured scheme boils down to the PMS scheme.

## Iv Conclusion

In this paper, we considered point to point discrete memoryless channel with noisy feedback. This falls into the purview of decentralized stochastic control where both the controllers have no common information. Thus, the standard tools in the literature do not apply directly. In this paper, we show that despite there being no common information, there does exist a dynamic programming methodology to compute optimum Markovian policies of the senders involving a belief on belief state. We also conjecture a transmitting scheme that is inspired by the PMS scheme that minimizes probability of error. We ask if based on the above framework, it is possible to design (possibly suboptimal) schemes that are easy to implement.

## V Acknowledgement

The author would sincerely like to thank Achilleas Anastasopoulos for valuable comments and (noiseless) feedback.

## Appendix A (Proof of Theorem 1)

###### Proof:

we will prove

(24) |

The above theorem implies for that

(25) |

We prove (25) using induction and from results in Lemma 2 and 4 proved in Appendix B.

With slight abuse of notation, let be a belief function that maps sender’s information to a belief on and is consistent (using Bayes’ rule) with given past history of policies. For base case at , where , | |||

(26a) | |||

(26b) |

where (26a) follows from Lemma 4 and (26b) follows from Lemma 2 in Appendix B.

## Appendix B

###### Lemma 2

where

(29) |

where .

###### Proof:

We prove this lemma by contradiction.

Suppose the claim is not true. This implies and generated from such that ,

, such that

(30) |

We will show that this contradicts the definition of in (21b).

Construct such that , , where .

###### Lemma 3

Suppose for some , and . Then

Comments

There are no comments yet.