# Single-bit Quantization Capacity of Binary-input Continuous-output Channels

We consider a channel with discrete binary input X that is corrupted by a given continuous noise to produce a continuous-valued output Y. A quantizer is then used to quantize the continuous-valued output Y to the final binary output Z. The goal is to design an optimal quantizer Q* and also find the optimal input distribution p*(X) that maximizes the mutual information I(X; Z) between the binary input and the binary quantized output. A linear time complexity searching procedure is proposed. Based on the properties of the optimal quantizer and the optimal input distribution, we reduced the searching range that results in a faster implementation algorithm. Both theoretical and numerical results are provided to illustrate our method.

## Authors

• 10 publications
• 7 publications
01/08/2020

### Optimal quantizer structure for binary discrete input continuous output channels under an arbitrary quantized-output constraint

Given a channel having binary input X = (x_1, x_2) having the probabilit...
01/07/2020

### Entropy-Constrained Maximizing Mutual Information Quantization

In this paper, we investigate the quantization of the output of a binary...
01/07/2020

### On the Uniqueness of Binary Quantizers for Maximizing Mutual Information

We consider a channel with a binary input X being corrupted by a continu...
01/06/2020

### Communication-Channel Optimized Partition

Given an original discrete source X with the distribution p_X that is co...
01/17/2020

### Grover's Algorithm and Many-Valued Quantum Logic

As the engineering endeavour to realise quantum computers progresses, we...
05/21/2020

### Blind Two-Dimensional Super Resolution in Multiple Input Single Output Linear Systems

In this paper, we consider a multiple-input single-output (MISO) linear ...
02/05/2022

### Sensing Method for Two-Target Detection in Time-Constrained Vector Gaussian Channel

This paper considers a vector Gaussian channel of fixed identity covaria...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction and Related Work

A communication system can be modeled by an abstract channel with a set of inputs at the transmitter and a set of corresponding outputs at the receiver. Often times the transmitted symbols (inputs) are different from the receiving symbols (outputs), i.e., errors occur due to many factors such as the physics of signal propagation through a medium or thermal noise. Thus, the goal of a communication system is to transmit the information reliably at the fastest rate. The fastest achievable rate with vanishing error for a given channel is defined by its channel capacity which is the maximum mutual information between input and output random variables. For an arbitrary discrete memoryless channel (DMC) that is specified by a given channel matrix, the mutual information is a concave function of the input probability mass function

[cover2012elements]. Thus, many efficient algorithms exist to find the channel capacity of DMC [blahut1972computation]. Moreover, under some special conditions of the channel matrix, the closed-form expressions of channel capacity can be constructed. [nguyen2018closed]. It is worth noting that due to the simplicity of binary channels, the closed-form expressions of channel capacity of a binary channel always can be found as a function of the diagonal entries of the channel matrix [moskowitz2010approximations], [moskowitz2009approximation], [silverman1955binary].

On the other hand, in many real-world scenarios, the input distribution is given, however, one has to design the channel matrix under the consideration of many factors such as power consumption, encoding/decoding speeds, and so on. As a result, the mutual information is no longer a concave function of the input distribution alone but is a possibly non-concave/convex function in both input distribution and the parameters of the channel matrix. Many advanced quantization algorithms have also been proposed over the past decade [kurkoski2014quantization], [winkelbauer2013channel], [iwata2014quantizer], [he2019dynamic], [sakai2014suboptimal], [koch2013low] to find the optimal quantizer under the assumption of given the input distribution. These algorithms play an important role in designing Polar code and LDPC code decoders [romero2015decoding], [tal2011construct].

Recently, there are many works on designing quantizers together with finding the optimal input distribution such that the mutual information over both quantization parameters and the input probability mass function is maximized. Although the mutual information is a concave function in the input pmf, it is not a convex/concave function in the quantization parameters i.e., thresholds. Therefore, many famous convex optimization techniques and algorithms for finding the global optimal solution are not applicable. To our best knowledge, this problem remains to be a hard and not well-studied [mathar2013threshold], [nguyen2018capacities], [alirezaei2015optimum], [singh2009limits], [kurkoski2012finding]. In [singh2009limits], Singh et al. provided an algorithm for multilevel quantization, which gave near-optimal results. In [nguyen2018capacities]

, the author proposed a heuristic near-optimal quantization algorithm. However, this algorithm only works well when the SNR ratio of the channel is high. For 1-bit quantization of general additive channels, Alirezaei and Mathar showed that capacity could be achieved by using an input distribution with only two support points

[mathar2013threshold]. In [kurkoski2012finding], the author gave a near-optimal algorithm to find the optimal of mutual information over both input distribution and quantizer variables for binary input and an arbitrary number of the quantized output, however, this algorithm may declare a failure outcome.

In this paper, we provide a linear time complexity searching procedure to find the global optimal of mutual information between input and quantized output over both input distribution and quantizer variables. Based on the properties of the optimal quantizer and the optimal input distribution, the searching range is reduced that finally results in a faster implementation algorithm. Both the theoretical and numerical results are provided to justify our contributions.

## Ii Problem description

We consider the channel shown in Fig. 1 where the binary signals having are transmitted and corrupted by a continuous noise source to produce a continuous-valued output at the receiver. Specifically, is specified by the a channel conditional density where models the distortion caused by noise. The receiver recovers the original binary signal using a quantizer that quantizes the received continuous-valued signal to . Since

, the quantization parameters can be specified by a thresholding vector

 h=(h1,h2,…,hn)∈Rn,

with , where is assumed a finite number. Theoretically, it is possible to construct the conditional densities such that the optimal quantizer might consist an infinite number of thresholds. However, for a practical implementation, especially when the quantizer is implemented using a lookup table, then a finite number of thresholds must be used. To that end, in this paper, we assume that the quantizer using an finite number of thresholds. Now, h induces disjoint partitions:

 H1=(−∞,h1),H2=[h1,h2),…,Hn+1=[hn,∞).

Let and , then and . Thus, divides into contiguous disjoint segments, each maps to either 0 or 1 alternatively Without the loss of generality, we suppose that the receiver uses a quantizer to quantize to as:

 Z={0if Y∈H,1if Y∈¯H. (1)

Our goal is to design an optimal quantizer , specifically and also find the optimal input distribution that maximizes the mutual information between the input and the quantized output :

 h∗,p∗X=argmaxh∗,p∗XI(X;Z). (2)

We note that the values of thresholds ’s, the number of thresholds and input distribution are the optimization variables. The maximization in (2) only assumes that the channel conditional density are given.

## Iii Optimality conditions

For convenience, we use the following notations:

1. denotes the probability mass function for the input , with and .

2. denotes probability mass function for the output , with and .

3. and denote conditional density functions of the received signal given the input signal and , respectively.

The 2 channel matrix associated with a discrete memoryless channel (DMC) with input and output is:

 A=[A111−A111−A22A22],

where

 A11=∫y∈Hϕ0(y)dy, (3)
 A22=∫y∈¯Hϕ1(y)dy. (4)

### Iii-a Optimal quantizer structure for a given input distribution

Our first contribution is to show that for a given input distribution the optimal binary quantizer with multiple thresholds, specified by a thresholding vector with , must satisfy the conditions stated in the Theorem 1 below.

###### Theorem 1.

Let be a thresholding vector of an optimal quantizer , then:

 ϕ0(h∗i)ϕ1(h∗i)=ϕ0(h∗j)ϕ1(h∗j)=r∗, (5)

for and some optimal constant .

###### Proof.

We note that using the optimal thresholding vector , the quantization mapping follows (1). divides into contiguous disjoint segments, each maps to either 0 or 1 alternatively. The discrete memoryless channel in Fig. 1 has the channel matrix

 A∗=[A11A12A21A22],

and the mutual information can be written as a function of as:

 I(h)=H(Z)−H(Z|X)=H(q0)−[p0H(A11)+p1H(A22)], (6)

where for any , and .

This is an optimization problem that maximizes . The theory of optimization requires that an optimal point must satisfy the KKT conditions [boyd2004convex]. In particular, define the Lagrangian function as:

 L(h,λ)=I(h)+n−1∑i=1λi(hi−hi+1), (7)

then the KKT conditions [boyd2004convex] states that, an optimal point must satisfy:

 ⎧⎪ ⎪⎨⎪ ⎪⎩dL(h,λ)dh|h=h∗,λ=λ∗=0,λ∗i(hi−hi+1)=0,i=1,2,…,n−1,λ∗i≥0,i=1,2,…,n−1. (8)

Since the structure of the quantizer requires that , the second and the third conditions in (8) together imply that . Consequently, from (7) and the first condition in (8), we have:

 dL(h,λ)dh|h=h∗,λ=λ∗=dI(h)dh|h=h∗=0.

By setting the partial derivatives of with respect to each to zero, we have

 ∂I(h)∂hi = (log1−q0q0)∂q0∂hi−p0(log1−A11A11)∂A11∂hi−p1(log1−A22A22)∂A22∂hi (9) = (log1−q0q0)(p0∂A11∂hi−p1∂A22∂hi)−p0(log1−A11A11)∂A11∂hi−p1(log1−A22A22)∂A22∂hi = p0∂A11∂hi(log1−q0q0−log1−A11A11)−p1∂A22∂hi(log1−q0q0+log1−A22A22)=0, (10)

with (9) due to .

Since and , from (10), we have:

 ϕ0(h∗i)ϕ1(h∗i)=−p1p0log1−q0q0+log1−A22A22log1−q0q0−log1−A11A11=r∗. (11)

Since (11) holds for , the RHS of (11) equals to some constant for a quantizer , Theorem 1 follows. ∎

Suppose the optimal value is given and the equation has solutions: . Then, Theorem 1 says that the optimal quantizer must either have its thresholding vector be or one of its ordered subsets, e.g., , or both. In Theorem 2 below, we will show that the quantizer whose thresholding vector contains all the solutions of , will be at least as good as any quantizer whose thresholding vector is a ordered subset of the set of all solutions.

###### Theorem 2.

Let be the solutions of for the optimal constant . Let be the quantizer whose thresholding vector contains all the solutions, i.e., , then for , is at least as good as any quantizer whose thresholding vector is an ordered subset of the set of .

###### Proof.

Due to the limited space, we do not present the detailed proof of Theorem 2. However, we refer the reader to Theorem 1 in [kurkoski2017single]. Indeed, Theorem 1 in [kurkoski2017single] showed that the optimal quantizer is equivalent to hyper-plane cuts in the space of posterior conditional distribution and it guarantees that at least one of the globally optimal quantizers has this structure. Due to the channel is binary, a hyper-plane in the posterior distribution is a point. That said, the optimal threshold vector should be the solutions of

 py|x1=ϕ0(y)ϕ0(y)+ϕ1(y)=a∗,0≤a∗≤1,

that, in turn, is equivalent to where . ∎

Theorem 2 is important in the sense that it provides a concrete approach to find the globally optimal quantizer by exhausted searching the optimal and using all the solutions of to construct the optimal threshold vector h. We also note that Theorem 2 can stand alone without using the proof of Theorem 1, however, Theorem 1 is useful in the sense that it provides an important connection between the optimal thresholds that produces the optimal channel matrix and the optimal input distribution. For example, the relationship in (11) will be used in the following Theorem 3.

###### Theorem 3.

Consider a binary channel with a given input distribution, corresponding to an optimal quantizer , the optimal channel matrix having diagonal entries and such that and .

### Iii-B Optimal input distribution for a given channel matrix

For a binary channel having a given channel matrix , the optimal input distribution can be determined in closed-form [moskowitz2010approximations], [moskowitz2009approximation]. Finally, the maximum of mutual information at the optimal distribution can be written as a function of channel matrix entries [moskowitz2010approximations], [moskowitz2009approximation], [silverman1955binary], [nguyen2018closed]. This result is summarized in the following Theorem.

###### Theorem 4.

For a given quantizer which corresponds to a given channel matrix , the maximum of mutual information can be written by the following closed-form

 I(X;Z)p∗X = log2[2−A22H(A11)+(A11−1)H(A22))A11+A22−1 (12) + 2−(A22−1)H(A11)+A11H(A22))A11+A22−1],

where .

###### Proof.

Please see the detailed proof in [moskowitz2010approximations], [moskowitz2009approximation], [silverman1955binary], [nguyen2018closed]. ∎

###### Theorem 5.

For a given quantizer which corresponds to a given channel matrix , the optimal input distribution and are bounded by:

 0.3679=1e
###### Proof.

Please see Theorem 1 in [majani1991two]. ∎

## Iv Finding channel capacity over both input distribution and threshold vector variables

Theorem 1 and Theorem 2 state that an optimal quantizer can be found by exhaustive searching the optimal value and use all the solutions of to construct the optimal thresholding vector . The mutual information , therefore, becomes a function of variable . Now, for a given , define , then

 Hr={(−∞,h1) ∪[h2,h3)∪⋯∪[hn,+∞)}.

Similarly, let , then

 ¯Hr=R∖Hr={[h1,h2)∪[h3,h4)∪⋯∪[hn−1,hn)}.

The sets and together specify a binary quantizer that maps to , depending on whether belongs to or . Without the loss of generality, suppose we use the following quantizer:

 z={0y∈Hr,1y∈¯Hr, (14)

then the channel matrix of the overall DMC is:

 A=[f(r)1−f(r)1−g(r)g(r)],

where and . and can be written in terms of and as:

 f(r)=∫y∈Hrϕ0(y)dy=∫h1−∞ϕ0(y)dy+∫h3h2ϕ0(y)dy+…+∫+∞hnϕ0(y)dy, (15)
 g(r)=∫y∈¯Hrϕ1(y)dy=∫h2h1ϕ1(y)dy+∫h4h3ϕ1(y)dy+…+∫hnhn−1ϕ1(y)dy. (16)

Using Theorem 4, the optimal of mutual information in Eq. (12) is:

 I(X;Z)(p∗X,r) = log2[2−g(r)H(f(r))+(f(r)−1)H(g(r)))f(r)+g(r)−1 (17) + 2−(g(r)−1)H(f(r))+f(r)H(g(r)))f(r)+g(r)−1].

Linear time complexity algorithm: using (17), an exhausted searching over can be applied to find the optimal of mutual information for both input distribution and threshold quantization. We note that and can be computed using (15) and (16) where are well defined as the solutions of .

Narrow down the searching area:

###### Theorem 6.

For an arbitrary binary channel, suppose that the optimal of mutual information over both input distribution and quantizer is achieved at the optimal quantizer which generates optimal channel matrix , then and .

###### Proof.

Combining Theorem 3 and Theorem 5, for an optimal quantizer, we should have and . ∎

###### Theorem 7.

in (15) is a monotonic decreasing function and in (16) is monotonic increasing function with variable .

###### Proof.

From Theorem 6, the entries and should be satisfy and . Thus, we can narrow down the searching range by limiting and such that and . Due to the monotonic increasing/decreasing of and , we can find the upper bound and lower bound of by solving two equations and . Using bisection search, finding the solutions of and takes the time complexity of where and is the resolution/accuracy of the solution.

## V Numerical Results

In this section, we find the optimal of mutual information for a channel having and . Due to and , we can limit the searching area of . Next, an exhaustive searching with the resolution over is performed. Fig. 2 illustrates the function of in Eq. (17) using variable . From our simulation, the optimal of for both input variable and threshold variable is at .

## Vi Conclusion

In this paper, we provide a linear time complexity searching procedure to find the global optimal of mutual information between input and quantized output over both input distribution and quantizer variables. Based on the properties of the optimal quantizer and the optimal input distribution, we reduced the searching range that finally results in a faster implementation. Both theoretical and numerical results are provided to justify our method.

### -a Proof of Theorem 2

Due to both distribution functions and are positive, thus, . From (11), we have:

 −p1p0log1−q0q0+log1−A22A22log1−q0q0−log1−A11A11≥0. (18)

Using a little bit of algebra, (18) is equivalent to

 (A11−p0)(A22−p1)≥0. (19)

Next, we show that . Indeed, and represent the quantized bits “0" and “1" which correspond to the areas of and , respectively. Let and .

We consider two possible cases: and . In both cases, we will show that .

If then for . Therefore,

 f(r)+g(r) = ∫y∈Hrϕ0(y)dy+∫y∈¯Hrϕ1(y)dy (20) ≥ ∫y∈Hrϕ0(y)dy+∫y∈¯Hrϕ0(y)dy (21) = 1. (22)

If then for . Therefore,

 f(r)+g(r) = ∫y∈Hrϕ0(y)dy+∫y∈¯Hrϕ1(y)dy (23) > ∫y∈Hrϕ1(y)dy+∫y∈¯Hrϕ1(y)dy (24) = 1. (25)

Therefore, . Thus, (19) is equivalent to and .

### -B Proof of Theorem 5

Due to represents the quantized bit “0" which is the area of where . Therefore, if is increasing, is obviously decreasing or . A similar proof can be established for .