A note on the expected minimum error probability in equientropic channels

While the channel capacity reflects a theoretical upper bound on the achievable information transmission rate in the limit of infinitely many bits, it does not characterise the information transfer of a given encoding routine with finitely many bits. In this note, we characterise the quality of a code (i. e. a given encoding routine) by an upper bound on the expected minimum error probability that can be achieved when using this code. We show that for equientropic channels this upper bound is minimal for codes with maximal marginal entropy. As an instructive example we show for the additive white Gaussian noise (AWGN) channel that random coding---also a capacity achieving code---indeed maximises the marginal entropy in the limit of infinite messages.

Authors

• 13 publications
• 1 publication
• 158 publications
• 13 publications
• A Single-Letter Upper Bound to the Mismatch Capacity

We derive a single-letter upper bound to the mismatched-decoding capacit...
04/03/2020 ∙ by Ehsan Asadi Kangarshahi, et al. ∙ 0

• An Improved Analysis of Least Squares Superposition Codes with Bernoulli Dictionary

For the additive white Gaussian noise channel with average power constra...
01/09/2018 ∙ by Yoshinari Takeishi, et al. ∙ 0

• An Upper Bound on the Error Induced by Saddlepoint Approximations – Applications to Information Theory

This paper introduces an upper bound on the absolute difference between:...

• Directed Data-Processing Inequalities for Systems with Feedback

We present novel data-processing inequalities relating the mutual inform...
03/25/2021 ∙ by Milan S. Derpich, et al. ∙ 0

• Efficient Conversion of Bayesian Network Learning into Quadratic Unconstrained Binary Optimization

Ising machines (IMs) are a potential breakthrough in the NP-hard problem...
06/12/2020 ∙ by Yuta Shikuri, et al. ∙ 0

• On the Performance of Direct Shaping Codes

In this work, we study a recently proposed direct shaping code for flash...
07/10/2020 ∙ by YI LIU, et al. ∙ 0

• On the Inability of Markov Models to Capture Criticality in Human Mobility

We examine the non-Markovian nature of human mobility by exposing the in...
07/27/2018 ∙ by Vaibhav Kulkarni, et al. ∙ 0

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

While the channel capacity reflects a theoretical upper bound on the achievable information transmission rate in the limit of infinitely many bits, it does not characterise the information transfer of a given encoding routine with finitely many bits. In this note, we characterise the quality of a code (i. e. a given encoding routine) by an upper bound on the expected minimum error probability that can be achieved when using this code. We show that for equientropic channels this upper bound is minimal for codes with maximal marginal entropy. As an instructive example we show for the additive white Gaussian noise (AWGN) channel that random coding—also a capacity achieving code—indeed maximises the marginal entropy in the limit of infinite messages.

2 Upper bounding the expected minimum error probability

Consider communication over noisy memoryless channels

 Mencoder−−−−→Xnchannel−−−−→Yndecoder−−−−→ˆM

where the sender node

is a random variable taking discrete values

according to ; the values of the sender bits are determined by the encoder function assigning codewords to messages; noise corruption of the received bits is governed by the conditional distribution as ;111To ease notation we assume for all . The results presented in this manuscript only require a memoryless channel and still hold true if noise corruption is bit-specific, i. e., . and the decoder reconstructs a message from the received bit values or declares an error. The message distribution , the encoder , the channel , and the decoder fully determine the distribution of the receiver node and as such the probability of error .

Thence, for given message distribution and channel the code, that is the choice of (and corresponding ), fully determines the behaviour of information transmission. The minimum probability of error is attained if choosing the maximum a posteriori (MAP) decoder . Thus, for any code , the expected minimum error probability is the MAP error . We characterise the quality of a code by the following Proposition.

Proposition 1.

For communication of a message with finite range over a noisy memoryless channel using bits the MAP error can be bounded in terms of the mutual information as

 γ(−I(Yn;M))≤E(fenc)≤Γ(−I(Yn;M))

where and are strictly monotonically increasing functions.

Proof.

[FM94, Theorem 1] establishes the following relation (notation adapted)

 Φ(E(fenc))≥H(M|Yn)≥ϕ∗(E(fenc))

where and are continuous and strictly monotonically increasing, hence invertible, functions (cf. [FM94] for their definitions). Recall and note that is fix for fixed . The inequality follows for and which are strictly monotonically increasing functions in . ∎

That is, codes that result in high result in a low upper bound on the MAP error. In particular, of all codes resulting in the same conditional entropy a code with maximal entropy has the lowest upper bound on the MAP error. The following Propositions simplify this result for equientropic channels and independent additive noise channels: The lowest upper bound on the MAP error is achieved for codes that maximise the entropy of receiver bits and the entropy of sender bits , respectively.

Definition 2.

A noisy memoryless channel with for all is an equientropic channel.

Proposition 3.

For equientropic channels the conditional entropy is independent of the choice of .

Proof.

The channel is memoryless such that . For any and

 H(Yj|M)=∑m∈MpM(m)H(Yj|Xj=fenc(mi)j)=∑m∈MpM(mi)H(Yj|Xj=x)

which shows that and hence is independent of the choice of . ∎

Definition 4.

A noisy memoryless channel with for mutually independent noise variables that are independent of is an independent additive noise channel. Independent additive noise channels are equientropic channels.

Proposition 5.

For independent additive noise channels with noise variables the entropy of the receiver bits only depends on the choice of via the entropy of the sender bits .

In conclusion, optimality of a code for communication over a noisy memoryless channel with message distribution can be characterised by the upper bound on the MAP error that results from this code. The respective bounds for different channels are summarised in Table 1

. Importantly, without knowing specific details about the channel and decoder, maximising entropy turns out to be a sensible heuristic for learning a robust coding routine. Intuitively, high entropy distributed codes are more robust against independent noise.

3 AWGN random coding example

The AWGN channel is an ubiquitous and well-understood channel model. Here it serves as an instructive example for the concept introduced in the previous section.

The AWGN channel is an independent additive noise channel and described by

 Z ∼N(0,NIn×n) Yi =gXi+Zi for i∈N1:n

where is the channel gain and the noise level. We employ the power constraint that each codeword has to satisfy

 1nn∑i=1(xi)2≤P

and without loss of generality assume such that the received power is . The Shannon-Hartley theorem establishes the channel capacity

 C=maxpX:EX[X2]≤PI(X;Y)=12log(1+S)

Achievability of this upper bound on the rate is commonly proven by random coding, i. e., for any rate the error probability tends to zero as if using random coding.

Here we show that random coding not only achieves the optimal rate but also the lowest upper bound on the MAP error in Proposition 1 since (and the are Gaussian maximising the individual entropies) in the limit .

In random coding the encoder function is defined by a random codebook, i. e., an independent sample of is assigned to each message as codeword . Once a codebook is fixed and we observe samples of the system each receiver bit is a mixture of Gaussians with probability densitiy function (pdf) where

denotes the pdf of the Gaussian distribution

evaluated at . For this setup we prove the following

Proposition 6.

Using random coding in the AWGN channel with the joint entropy for any number of pairwise different receiver bits . Furthermore, the distribution of each approaches a Gaussian distribution as .

Proof.

In random coding the random codebook is generated by drawing each from independent random variables , which then defines the joint pdf

 pYj1,...,Yjk(yj1,...,yjk)=|M|∑i=1pM(mi)(2π)−k2e−12∑kl=1(yjl−cijl)2

and marginal pdfs

 pYjl(yjl)=|M|∑i=1pM(mi)(2π)−12e−12(yjl−cijl)2

for and . In general .

For all and define the random variables

 ˚pYj1,...,Yjk(yj1,...,yjk)=1|M||M|∑i=1(2π)−k2e−12∑kl=1(yjl−Cijl)2

and

 ˚pYjl(yjl)=1|M||M|∑i=1(2π)−12e−12(yjl−Cijl)2

By the law of large numbers

 ˚pYj1,...,Yjk(yj1,...,yjk) almost surely−−−−−−−−−→n→∞EC1j1,...,C1jk[(2π)−k2e−12∑kl=1(yjl−C1jl)2] ˚pYjl(yjl) almost surely−−−−−−−−−→n→∞EC1jl[(2π)−12e−12(yjl−C1jl)2]

where the first expectation factorises since the are mutually independent. It follows that for all

 ˚pYj1,...,Yjk(yj1,...,yjk)−k∏l=1˚pYjl(yjl)almost surely−−−−−−−−−→n→∞0

such that in the limit the pdf indeed factorises. Evaluating the expectation above we find that for each and which concludes the proof. ∎

It is instructive to consider the analogous statement for any pairwise different sender bits . The proof follows analogous arguments and is another illustration of the fact that in independent additive noise channels the bound on the MAP error is fully determined by the entropy of the sender bits .

4 Further thoughts

According to the efficient coding hypothesis the brain implements an efficient code for representing sensory input by neuronal spiking

[Bar61]. Observed dependencies between neurons and hence redundancies are sometimes viewed as contradicting the efficient coding hypothesis [Bar61, Sim03]. The results presented in Section 2 clarify, however, that an optimal code should maximise the joint entropy of receiver (or sender) bits. For fixed marginal entropies

the maximum is indeed achieved if all units are mutually independent. However, since the marginal entropies are not fixed there can in general be configurations that have higher joint entropy while the units are not mutually independent. This also clarifies the intuition expressed in Shannon’s early work that the transmitted signals should approximate white noise to approximate the maximum information rate

[Sha48, Section 25.].

References

• [Bar61] H. Barlow. Possible Principles Underlying the Transformations of Sensory Messages. In W. Rosenblith, editor, Sensory Communication, chapter 13, pages 217–234. MIT Press, 1961.
• [FM94] M. Feder and N. Merhav. Relations between entropy and error probability. Information Theory, IEEE Transactions on, 40(1):259–266, 1994.
• [Sha48] C. E. Shannon. A Mathematical Theory of Communication. Bell System Technical Journal, 27(3):379–423, 1948.
• [Sim03] E. Simoncelli. Vision and the statistics of the visual environment. Current opinion in neurobiology, 13(2):144–149, 2003.