Quantization plays a critical role in digital signal processing systems. Quantizers are typically designed to obtain an accurate digital representation of the input signal, operating independently of the system task, and are commonly implemented using serial scalar analog-to-digital converters (ADCs). In this work, we study hardware-limited task-based quantization, where a system utilizing a serial scalar ADC is designed to provide a suitable representation in order to allow the recovery of a parameter vector underlying the input signal. We propose hardware-limited task-based quantization systems for a fixed and finite quantization resolution, and characterize their achievable distortion. We then apply the analysis to the practical setups of channel estimation and eigen-spectrum recovery from quantized measurements. Our results illustrate that properly designed hardware-limited systems can approach the optimal performance achievable with vector quantizers, and that by taking the underlying task into account, the quantization error can be made negligible with a relatively small number of bits.

## Authors

• 41 publications
• 112 publications
• 24 publications

Quantizers play a critical role in digital signal processing systems. Re...
08/01/2019 ∙ by Nir Shlezinger, et al. ∙ 0

Obtaining digital representations of multivariate continuous-time (CT) s...
09/29/2020 ∙ by Peter Neuhaus, et al. ∙ 0

• ### Comparison-limited Vector Quantization

A variation of the classic vector quantization problem is considered, in...
05/14/2019 ∙ by Stefano Rini, et al. ∙ 0

• ### Distributed Quantization for Sparse Time Sequences

Analog signals processed in digital hardware are quantized into a discre...
10/21/2019 ∙ by Alejandro Cohen, et al. ∙ 0

• ### Novel Near-Optimal Scalar Quantizers with Exponential Decay Rate and Global Convergence

Many modern distributed real-time signal sensing/monitoring systems requ...
10/29/2018 ∙ by Vijay Anavangot, et al. ∙ 0

• ### Serial Quantization for Representing Sparse Signals

Sparse signals are encountered in a broad range of applications. In orde...
07/03/2019 ∙ by Alejandro Cohen, et al. ∙ 0

• ### Image De-Quantization Using Generative Models as Priors

Image quantization is used in several applications aiming in reducing th...
07/15/2020 ∙ by Kalliopi Basioti, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Quantization refers to the representation of a continuous-amplitude signal using a finite dictionary, or equivalently, a finite number of bits [1]. Quantizers are implemented in digital signal processing systems using analog-to-digital convertors, which typically operate in a serial scalar manner due to hardware-limitations. In such systems, each incoming continuous-amplitude sample is represented in digital form using the same mechanism [2]. The quantized representation is commonly selected to accurately match the original signal, in the sense of minimizing some distortion measure, such that the signal can be recovered with minimal error from the quantized measurements [3, Ch. 10], [4].

Quantization design is typically performed regardless of the system task. However, in many signal processing applications, the goal is not to recover the actual signal, but to capture certain parameters, such as an underlying model or unknown channel, from the quantized signal [5]. We refer to systems where one wishes to extract some information from the quantized signal, rather than recovering the signal itself, as task-based quantization, and to such systems operating with serial scalar ADCs as hardware-limited task-based quantization systems.

Hardware-limited quantization with low resolution is the focus of growing interest over recent years due to the increasing complexity and bitrate demands of modern signal processing and communications systems. Common tasks considered with low resolution hardware-limited quantization include multiple-input multiple-output (MIMO) communications [6, 7, 8, 9, 10, 11], channel estimation [12, 15, 13, 14, 10, 11], subspace estimation [16], time difference of arrival estimation [17], and direction of arrival (DOA) estimation [18, 19]. In these works it is assumed that quantization is carried out separately from the system task, typically using fixed uniform low-precision quantizers, e.g., one-bit quantization of a scalar value is implemented using the function [6]. Accordingly, these works do not provide guidelines to designing quantization systems with a small and finite number of bits by acknowledging the task of the system.

When hardware-limitations are not present, task-based quantization systems can take advantage of joint vector quantization, which is known to be superior to serial scalar quantization [20, Ch. 22.2]. Previous works on task-based quantization without hardware limitations can be divided according to whether the parameter vector is modeled as a random vector, namely, a Bayesian setup, or as a deterministic unknown parameter. When the signal parameter is random, task-based quantization can be viewed as an indirect lossy source coding problem111Direct lossy source coding typically refers to the standard quantization setup where the task of the system is to recover the quantized signal, while indirect source coding refers to task-based quantization [23]. [1, Sec. V-G]. For this setup with a stationary source that is related to the observation vector via a stationary memoryless channel, Witsenhausen proved in [21] that the rate-distortion function, namely, the minimal number of bits required to obtain a given representation accuracy determined by the distortion measure, is asymptotically equivalent to the rate-distortion function for representing the observed signal – instead of the signal parameter – with a surrogate distortion measure. Under MSE distortion, Wolf and Ziv proved in [22] that this equivalence also holds for finite signal size, and the work [5] provided guidelines to the optimal joint quantization and estimation scheme. Recently, Kostina and Verdu characterized nonasymptotic bounds on the rate-distortion functions for indirect as well as direct lossy source coding with arbitrary distortion measures [23, 24], by considering single-shot quantization, and specialized the bounds for i.i.d. signals with separable distortion. The indirect source coding framework was also used to study conversion of continuous-time signals into quantized discrete-time signals in [25, 26]. The focus in the works [21, 22, 24, 23, 25, 26] is on the optimal tradeoff between quantization rate and achievable distortion. Consequently, their results cannot be applied to quantify the achievable performance of practical hardware-limited systems utilizing serial scalar ADCs.

For a signal parameter modeled as deterministic and unknown, [27] studied detection from quantized observations, i.e., recovering a scalar binary parameter, while [28]

treated detection from quantized prior probabilities. Quantization for the recovery of a scalar parameter taking values on a discrete finite set was studied in

[29]. The design of quantizers for the recovery of a vector parameter taking values on a continuous set was considered in [30], which proposed an adaptive algorithm for tunning the quantizer. In all the works above, i.e., [27, 30, 29, 28], as well as in [5], the analysis assumes vector quantizers with high resolution, where the number of bits used for representing the quantized signal can be made arbitrarily large. They do not consider practical systems that utilize serial scalar quantizers with a fixed and finite number of bits.

### Main Contributions

In this work we study quantization for the task of acquiring a random parameter vector taking values on a continuous set, from a statistically dependent observations vector, using practical serial scalar ADCs operating with a fixed number of bits. We first consider the case where the observations and the desired vector are related such that the minimum (MMSE) estimate is a linear function of the observations. Such relationships are commonly encountered in channel estimation and signal recovery problems, e.g., [5, 8, 9, 10, 11, 12, 15, 13, 14]. We focus on practical systems implementing uniform quantization with linear processing, allowing analog combining prior to digital processing. This approach was previously studied in the context of MIMO communications [6, 31, 32, 33]. For this setup, we derive the optimal hardware-limited task-based quantization system, and characterize the achievable distortion.

The optimal system accounts for the task by reducing the number of quantized samples via an appropriate linear transformation to be not larger than the size of the desired signal. It then rotates the quantized samples to have identical variance. Quantization is performed based on a waterfilling-type expression, accounting for the serial operation and the limited dynamic range of practical

In addition, we characterize the minimal achievable distortion of two suboptimal approaches: We first discuss systems in which processing is carried out only in the digital domain, as is the structure considered in the majority of the literature of tasks performed with low resolution quantization, e.g., [12, 15, 17, 13, 14, 18, 16]. Then, we study systems which quantize the MMSE estimate, an approach which is known to be optimal when using vector quantizers [22], and was also proposed for compressed sensing with quantized measurements [34]. Surprisingly, we show that, unlike when vector quantizers are employed, in the presence of serial scalar ADCs, quantizing the MMSE estimate is generally suboptimal. We provide a necessary and sufficient condition for this approach to coincide with the optimal design.

Next, we extend the proposed system to scenarios where the observations and the desired vector are related via an arbitrary stochastic model. In particular, we identify the main design guidelines associated with the case where the MMSE estimate is a linear function of observations, and discuss how they can be applied for arbitrary models. Then, we explicitly show how these guidelines can be used to construct a hardware-limited task-based quantization system for scenarios in which the desired vector can be recovered from the empirical covariance of the observations, as in [5, 16, 17, 18, 19].

Finally, we apply our results to two practical setups: Channel estimation from quantized measurements [12, 15, 13, 14, 10, 11] and eigen-spectrum estimation from quantized measurements[5]. We demonstrate that, by properly accounting for the presence of serial scalar ADCs, practical hardware-limited systems operating with a relatively small number of bits can approach the optimal performance, achievable with vector quantizers, in practical and relevant scenarios. Furthermore, we show that hardware-limited quantizers designed accounting for the task of the system can substantially outperform task-ignorant systems utilizing vector quantizers. This gain is mainly achieved by applying task-based linear analog processing, in addition to the digital processing.

### Organization and Notations

The rest of this paper is organized as follows: Section II briefly reviews some preliminaries in quantization theory, and formulates the hardware-limited task-based quantization setup. Section III discusses task-based quantization with vector quantizers. Section IV studies hardware-limited task-based quantization when the MMSE estimate is linear, and Section V extends the proposed design to arbitrary setups. Section VI presents the application of the results in a numerical study. Section VII provides some concluding remarks. Detailed proofs of the results are given in the appendix.

Throughout the paper, we use boldface lower-case letters for vectors, e.g., ; the th element of is written as . Matrices are denoted with boldface upper-case letters, e.g., , and is its th element. Sets are denoted with calligraphic letters, e.g., , and is the th order Cartesian power of . Transpose, Euclidean norm, trace, stochastic expectation, sign, and mutual information are written as , , , , , and , respectively, and is the set of real numbers. We use to denote , and is the identity matrix. All logarithms are taken to basis 2.

## Ii Preliminaries and Problem Statement

### Ii-a Preliminaries in Quantization Theory

To formulate the hardware-limited task-based quantization problem, we first review standard quantization notations, after which we discuss task-based quantization. To that aim, we recall the definition of a quantizer:

###### Definition 1 (Quantizer).

A quantizer with bits, input size , input alphabet , output size , and output alphabet , consists of: 1) An encoding function which maps the input into a discrete index . 2) A decoding function which maps each index into a codeword .

We write the output of the quantizer with input as . Scalar quantizers operate on a scalar input, i.e., and is a scalar space, while vector quantizers have a multivariate input. The set of codewords is referred to as the quantization codebook. When the input size and output size are equal, namely, , we write . An illustration is given in Figure 1.

#### Ii-A1 Standard Quantization

In the standard quantization problem, a quantizer is designed to minimize some distortion measure between its input and its output. The performance of a quantizer is therefore characterized using two measures: The quantization rate, defined as , and the expected distortion . For a fixed input size and codebook size , the optimal quantizer is thus given by

 (1)

Characterizing the optimal quantizer via (1) and the optimal tradeoff between distortion and quantization rate is in general a very difficult task. Consequently, optimal quantizers are typically studied assuming either high quantization rate, i.e., , see, e.g., [35], or asymptotically large input size, namely, , typically with i.i.d. inputs222Rate-distortion theory can also be used for non i.i.d. signals, see, e.g., [36, Ch. 5]. However, the simple classical expression, as given by the distortion-rate function in Def. 2, requires the observed signal to have i.i.d. entries., via rate-distortion theory [3, Ch. 10]. For example, when the quantizer input consists of i.i.d. random variables with probability measure , and the distortion measure can be written as for some , then the optimal distortion in the limit for a given rate is given by the distortion-rate function:

###### Definition 2 (Distortion-rate function).

The distortion-rate function for input with respect to the distortion measure is defined as

 Dx(R)=minf^x|x:I(^x;x)≤RE{d(^x,x)}. (2)

The conditional distribution which obtains the minima in (2), , is referred to as the optimal distortion-rate distribution, and the marginal distribution is referred to henceforth as the optimal marginal distortion-rate distribution.

Comparing high quantization rate analysis for scalar quantizers and rate-distortion theory for vector quantizers demonstrates the sub-optimality of serial scalar quantization. For example, for large , even for i.i.d. inputs, vector quantization outperforms serial scalar quantization, with a distortion gap of dB for Gaussian inputs with the MSE distortion [20, Ch. 23.2].

In task-based quantization the design objective of the quantizer is some task other than minimizing the distortion between its input and output. In the following, we focus on the generic task of acquiring a zero-mean random vector from a measured zero-mean random vector , where and are related via a conditional probability measure , and . This formulation accommodates a broad range of tasks, including channel estimation, covariance estimation, and source localization. A natural distortion measure for such setups is the MSE, which we consider throughout the paper. An illustration of a task-based quantization system is depicted in Figure 2.

### Ii-B Problem Formulation

In this work we study task-based quantization with hardware limitations. As discussed in the introduction, practical digital signal processing systems typically obtain a digital representation of physical analog signals using serial scalar ADCs. We refer to task-based quantization with serial scalar ADCs as hardware-limited task-based quantization. Since in such systems, each continuous-amplitude sample is converted into a discrete representation using a single quantization rule, this operation can be modeled using identical scalar quantizers. Consequently, the system we consider is modeled using the setup depicted in Fig. 3.

The observed signal is projected into , , using some mapping , which represents the pre-quantization processing carried out in the analog domain. Since general mappings may be difficult to implement in analog, we henceforth restrict to be a linear function, namely, we only allow analog combining, as in, e.g., [6, 31]. In this case, for some .

Each entry of is quantized using the same scalar quantizer with resolution , denoted . The overall number of quantization levels is thus . We note that , which represents the memory requirement of the system, is also directly related to the ADC power consumption. However, for the same overall number of quantization levels , different selections of may result in different power consumptions, depending on the physical implementation of the ADC, see, e.g., [33]. It is emphasized that in the following we keep the value of fixed and finite, i.e., the memory requirement, which is independent of the specific implementation of the ADC, is the same for all considered systems.

The representation of , denoted , is obtained as the output of some post-quantization mapping , applied to the output of the identical scalar quantizers. The mapping represents the joint-processing carried out in the digital domain. The quantized representation can be written as

 ^s=hd(Q1~Mp((ha(x))1),…,Q1~Mp((ha(x))p)). (3)

The novelty of the model in Fig. 3, compared to previous works on quantization for specific tasks with serial scalar ADCs, e.g., [8, 9, 10, 11, 12, 15, 13, 14, 16, 17, 18, 19], is in the introduction of the additional linear processing carried out in the analog domain, represented by the mapping . The concept of using analog combining prior to digital processing was previously studied in the context of MIMO communications in [6, 7, 31, 32, 33]. The motivation for introducing is to reduce the dimensionality of the input to the ADC, thus facilitating a more accurate quantization without increasing the overall number of bits, . As shown in the following sections, by properly designing , this approach can substantially improve the performance of task-based quantizers operating with serial scalar ADCs.

Our analysis of hardware-limited task-based quantization, focusing on the MSE distortion, consists of three parts:

1. As a preliminary step, in Section III, we discuss non hardware-limited task-based quantization systems, namely, systems implementing task-based quantization using optimal vector quantizers instead of serial scalar ADCs. The purpose of this analysis is to serve as a basis for comparing the performance of hardware-limited task-based quantizers to vector quantizers.

2. Next, in Section IV, we focus on the case where and are related such that the MMSE estimate of from is a linear function of . Such relationships arise in various channel estimation and signal recovery setups, e.g., [5, 8, 9, 10, 11, 12, 15, 13, 14]. For this setting, we propose a hardware-limited quantization system design, and characterize its achievable distortion. We also characterize the minimal achievable distortion when no pre-quantization processing is carried out, as well as when the analog combiner is designed to recover the MMSE estimate.

3. Then, in Section V, we use the characterization of and given in Section IV for linear models, to provide guidelines for designing and under arbitrary relationships between and . We suggest a concrete design for cases in which

can be estimated from the second-order statistical moments of

, as in [5, 16, 17, 18, 19].

Our analysis shows that, unlike when vector quantizers are applied, the optimal strategy for systems utilizing serial scalar ADCs is not to quantize the MMSE estimate. Instead, the input to the ADC is rotated to account for the identical quantization rule of serial scalar ADCs, and includes a waterfilling-type expression to account for the limited dynamic range. Furthermore, our numerical comparison presented in Section VI demonstrates that the proposed system, which uses simple hardware, can approach the performance of the optimal vector quantizer.

As a preliminary step towards our study of hardware-limited task-based quantizers, we consider task-based quantization which utilize vector quantizers without hardware limitations. We focus on two approaches for task-based quantization: In the first approach, referred to as optimal task-based quantization, the quantizer in Figure 2 is designed to recover the desired vector . In the second strategy, described in Figure 4 and referred to as task-ignorant quantization, the quantizer is designed to recover the observed vector separately from the task, and is estimated from the quantized representation. The optimal task-based quantizer obtains the minimal achievable distortion for a given quantization rate, while the task-ignorant quantizer is represents the best system one can construct when the quantizer is designed separately from the task.

The approaches we discuss below are based on joint vector quantization, and thus cannot be implemented using practical serial scalar ADCs.

For the MSE distortion, the optimal quantizer is constructed by first obtaining the MMSE estimate of from , , and then quantizing the estimate [22]. This leads to a minimal distortion given by

 minQn,kM(⋅)E{∥∥s−Qn,kM(x)∥∥2}=E{∥s−~s∥2}+minQkM(⋅)E{∥∥~s−QkM(~s)∥∥2}. (4)

It follows from (4) that the minimal distortion is the sum of the minimal estimation error of from , and the minimal distortion in quantizing the MMSE estimate . The latter can be obtained explicitly under a high quantization rate assumption, i.e., , using fine quantization analysis, as was done in [5], or alternatively, when has i.i.d. entries and tends to infinity, using rate-distortion theory. For finite , and , the minimal distortion in quantizing the MMSE estimate may be bounded as stated in the following proposition:

###### Proposition 1.

For any random vector with probability measure independent of , the minimal MSE in quantizing the MMSE estimate using a quantizer satisfies

 D~s(logM)≤minQkM(⋅)E{∥∥~s−QkM(~s)∥∥2}≤E⎧⎪⎨⎪⎩∞∫0[Pr(∥~c−~s∥2>t∣∣~s)]Mdt⎫⎪⎬⎪⎭. (5)

Proof: See Appendix -A.

The bounds in (5) are used in the sequel for comparing the performance of hardware-limited task-based quantization to the optimal performance achievable using vector quantizers. The upper bound in (5) is the exact performance of random coding, which is known to provide a relatively tight bound for fixed blocklengths [24], and to asymptotically achieve the distortion-rate curve [20, Ch. 23.2]. A reasonable assignment for in (5) is the optimal marginal distortion-rate distribution; with this distribution the distortion of quantizers with i.i.d. random codewords coincides with the distortion-rate function for sources generating asymptotically large number of i.i.d. realizations of [20, Ch. 24.2]. In general, the distortion-rate function and the optimal marginal distribution can be obtained using iterative algorithms, e.g., the Blahut-Arimoto algorithm [3, Ch. 10.8] and its extensions to continuous-valued RVs [37].

When the quantizer operates independently of the task, the desired vector must be estimated directly from the quantized observations. For the optimal quantizer and estimator for this setup, minimizes the MSE between its output and , and is estimated from the output of the quantizer using the MMSE estimator. From the orthogonality principle, the resulting MSE in estimating is

 E{∥∥s−E{s∣∣QnM(x)}∥∥2} =E{∥s−~s∥2}+E{∥∥~s−E{s∣∣QnM(x)}∥∥2} (a)=E{∥s−~s∥2}+E{∥∥~s−E{~s∣∣QnM(x)}∥∥2}, (6)

where follows since

form a Markov chain, thus, by

[38, Prop. 4], . The relation in (6) shows that the distortion of the task-ignorant quantizer is given by the sum of the estimation error of the MMSE estimate and the estimation error of the MMSE estimate of from the quantizer output . The main difference between (6) and the optimal estimation error in (4) is that in (6) the quantizer is fixed, while in (4) it can be set to minimize the estimation error.

In order to compute (6), the distribution of given is required, which may be difficult to characterize. One scenario in which this requirement can be relaxed is when is a linear function of , i.e., for some . To formulate the resulting distortion, let and be the covariance matrices of and of , respectively. The MSE in this case is stated in the following proposition.

###### Proposition 2.

When , is zero-mean, and is the optimal quantizer of , then

 E{∥∥~s−E{~s∣∣QnM(x)}∥∥2}=Tr(ΓTΓ(Σx−ΣQnM(x))). (7)
###### Proof:

When is the linear MMSE estimator, the second summand in (6) can be written as . Therefore,

 E{∥∥~s−E{~s∣∣QnM(x)}∥∥2} (a)=E{∥∥Γ(x−QnM(x))∥∥2} =Tr(ΓTΓE{(x−QnM(x))(x−QnM(x))T}) (b)=Tr(ΓTΓ(Σx−ΣQnM(x))), (8)

where follows since is the optimal quantizer of in the MSE sense, hence ; and is a result of the fact that the optimal quantizer is uncorrelated with the quantization error [1, Sec. III]. ∎

Proposition 2 suggests that, when is a linear function of , the distortion can be evaluated using only the covariance matrix of the task-ignorant quantizer . Nonetheless, the covariance of the quantizer which minimizes the distortion with respect to is typically difficult to compute for finite . Since [20, Thm. 23.2], a possible approach to approximate the distortion is to evaluate Proposition 2 with the covariance matrix of the output distribution which obtains the distortion-rate function , instead of . This replacement can provide a reliable characterization of the performance of random codes distributed via the optimal marginal distortion-rate distribution for large . In the numerical study in Section VI we illustrate that (7) approaches the performance of the optimal quantizer designed to recover .

## Iv Hardware-Limited Task-Based Quantization Systems Design

### Iv-a Model Assumptions

We now study the design of hardware-limited task-based quantization systems illustrated in Fig. 3. As stated in the problem formulation, we consider the case where , and are fixed and finite, namely, we do not assume high quantization rate or arbitrarily large inputs. In such cases, explicitly characterizing the optimal quantization system and the minimal achievable distortion is a very difficult task, just as characterizing the minimal achievable distortion in lossy source coding with fixed blocklengths is difficult [24, 23]. Consequently, in the following we focus on scenarios in which the stochastic relationship between the vector of interest and the observations vector is such that the MMSE estimate of from , , is a linear function of . By focusing on these setups, we are able to explicitly derive the achievable distortion and to characterize the system which achieves minimal distortion. Furthermore, as detailed in Section V, this analysis provides guidelines for designing hardware-limited task-based quantization systems, which can be used for any relationship between and .

In our analysis we restrict the digital mapping to be linear, namely, , . This constraint leads to practical systems, and is not expected to have a notable effect on the overall performance, especially when the error due to quantization is small, since the MMSE estimator here is linear.

To design a system which operates with simple scalar uniform quantizers, we carry out our analysis assuming dithered quantization [39]. Using dithered quantizers results in some favorable properties of the quantized signal, elaborated on in the sequel, which facilitate the analysis. These properties are also approximately satisfied without dithering for many input distributions [40]. Therefore, by considering dithered quantization, we are able to rigorously derive the optimal system, where in practice the resulting system can approach the optimal performance using standard uniform quantizers without dithering.

More specifically, we assume the identical scalar quantizers implement non-subtractive uniform dithered quantization [39]. Unlike subtractive dithered quantization, considered in, e.g., [42], non-subtractive quantizers do not require the realization of the dithered signal to be subtracted from the quantizer output in the digital domain, resulting in a practical structure [39]. An illustration is depicted in Figure 5.

To formulate the input-output relationship of the serial ADC, let denote the dynamic range of the quantizer, and define as the quantization spacing. The uniform quantizer is designed to operate within the dynamic range, namely, the amplitude of the input is not larger than with sufficiently large probability. To guarantee this, we fix to be some multiple

of the maximal standard deviation of the input. By Chebyshev’s inequality

[3, Pg. 64], for the amplitude of the input is smaller than the dynamic range with probability over . We assume that , such that the variable is strictly positive. Note that satisfies this requirement for any , i.e., the ADC is implemented using scalar quantizers with at least one bit. The output of the serial scalar ADC with input sequence can be written as , where are i.i.d. RVsuniformly distributed over , mutually independent of the input, representing the dither signal. The function , which implements the uniform quantization, is given by

 qp(y)=⎧⎪⎨⎪⎩−γ+Δp(l−12)y−l⋅Δp∈[−Δp2,Δp2],l∈{0,1,…,~Mp−1}sign(y)(γ−Δp2)|y|>γ.

Note that when , the resulting quantizer is a standard one-bit sign quantizer of the form , where the constant is determined by the dynamic range .

Dithered quantizers significantly facilitate the analysis, due to the following favorable properties: The output can be written as the sum of the input and an additive zero-mean white quantization noise signal, and the quantization noise is uncorrelated with the input. The drawback of adding dither is that it increases the energy of the quantization noise, namely, it results in increased distortion [39]. Nonetheless, the favorable properties of dithered quantization are also satisfied in uniform quantization without dithering

for inputs with bandlimited characteristic function, and are approximately satisfied for various families of input distributions, including the Gaussian distribution

[40]. Consequently, while in the following analysis we assume dithered quantization, exploiting the fact that the resulting quantization noise is white and uncorrelated with the input, the proposed system can also be applied without dithering. Furthermore, as demonstrated in Section VI, applying the proposed system without dithering yields improved performance, due the reduced energy of the quantization noise.

### Iv-B Optimal Hardware-Limited Task-Based Quantizer

We now characterize the optimal hardware-limited task-based quantizer under the system model detailed in the previous subsection. Our characterization yields the optimal analog combining matrix and digital processing matrix, denoted and , respectively, and the corresponding dynamic range . Since for any quantized representation , it follows from the orthogonality principle that the MSE, , equals the sum of the estimation error of the MMSE estimate, , and the distortion with respect to the MMSE estimate, , in the following we characterize the performance of the proposed systems via the distortion with respect to .

Let be the MSE optimal transformation of , namely, , and let be the covariance matrix of , assumed to be non-singular. Before we derive the optimal hardware-limited task-based quantization system, we first derive the optimal digital processing matrix for a given analog combining matrix and the resulting MSE, which is stated in the following lemma:

###### Lemma 1.

For any analog combining matrix and dynamic range such that , the optimal digital processing matrix is

 Bo(A)=ΓΣxAT(AΣxAT+2γ23~M2pIp)−1,

and the minimal achievable MSE is given by

 MSE(A) =minBE{∥~s−^s∥2} =Tr(ΓΣxΓT−ΓΣxAT(AΣxAT+2γ23~M2pIp)−1AΣxΓT).

Proof: See Appendix -B.

The optimal digital processing matrix in Lemma 1 is the linear MMSE estimator of from the vector , where represents the quantization noise, which is white and uncorrelated with . This stochastic representation is a result of the usage of dithered quantizers. Additionally, it is assumed in Lemma 1 that the input to the quantizers is in the dynamic range of the quantizers, namely, for each . When this requirement is not satisfied, by the law of total expectation, the resulting MSE includes an additional weighted term which accounts for working outside the dynamic range. However, as explained in Subsection IV-A, we require the uniform quantizers to operate within their dynamic range, and set the value of accordingly.

We now use Lemma 1 to obtain the optimal analog combining matrix and the resulting system. Define the matrix , and let

be its singular values arranged in a descending order. Note that for

, . The optimal hardware-limited task-based quantization system is given in the following theorem:

###### Theorem 1.
 For the optimal quantization system based on the model detailed in Subsection IV-A, the analog combining matrix Ao is given by Ao=UAΛAVTAΣ−1/2x, where
• is the right singular vectors matrix of .

• is a diagonal matrix with diagonal entries

 (ΛA)2i,i=2κp3~M2p⋅p(ζ⋅λ~Γ,i−1)+, (9a)

where is set such that .

• is a unitary matrix which guarantees that

has identical diagonal entries, namely, is weakly majorized by all possible rotations of [43, Cor. 2.1]. The matrix can be obtained333The existence of the unitary matrix is guaranteed by [43, Cor. 2.1]. However, this matrix is not unique as, e.g., both and result in a rotation of having identical diagonal entries. via [43, Alg. 2.2].

The dynamic range of the ADC is given by

 γ2=κpp=η2p(1−η23~M2p)−1, (9b)

and the digital processing matrix is equal to

 Bo(Ao)=~ΓVAΛTA⎛⎜⎝ΛAΛTA+2γ23~M2pIp⎞⎟⎠−1UTA. (9c)

The resulting minimal achievable distortion is

 E{∥~s−^s∥2}=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩k∑i=1λ2~Γ,i(ζ⋅λ~Γ,i−1)++1,p≥kp∑i=1λ2~Γ,i(ζ⋅λ~Γ,i−1)++1+k∑i=p+1λ2~Γ,i,p

Proof: See Appendix -C.

We note that, unlike task-based vector quantizers, the optimal hardware-limited system does not recover the MMSE estimate in the analog domain. Since the quantization is carried out using a serial scalar ADC, the optimal analog combining rotates the input to the ADC such that each entry has identical variance, accounting for the fact that the same quantization rule is applied to each entry. Furthermore, the optimal analog combiner includes a waterfilling-type expression over its singular values, which accounts for the finite dynamic range of the ADC. In particular, the waterfilling allows the optimal system to balance the estimation and quantization errors. To see this, we note from Appendix -C that the matrix determines the dynamic range . Consequently, by potentially nulling the diagonal entries corresponding to the less dominant singular values , the optimal quantizer can reduce the dynamic range. This yields a more precise quantization and reduces the quantization error, at the cost of a small estimation error.

Theorem 1 also provides guidelines to selecting the dimensions of the output of the analog combiner, as stated in the following corollary:

###### Corollary 1.

In order to minimize the MSE, must not be larger than the rank of the covariance matrix of .

Proof: See Appendix -D.

Corollary 1 indicates that analog combining should project the observed vector such that the signal which undergoes the serial scalar quantization has reduced dimensionality, not larger than the rank of the covariance of . This follows since, by reducing the dimensionality of the input to the ADC while keeping the overall number of quantization levels fixed, the quantization error induced by the scalar quantization is reduced. The exact optimal value of is determined by the values of the non-zero singular values . In particular, the MSE expression in Theorem 1 implies that decreasing below the number of non-zero singular values results in a tradeoff between improving quantization precision and increasing the estimation error. In the numerical analysis in Section VI we demonstrate that using the proposed hardware-limited task-based system, the quantization error is made negligible for relatively small , and the performance approaches that of the MMSE estimator.

Finally, we show that when the quantization resolution is sufficiently large, the proposed system produces the MMSE estimate . To that aim, we assume that the covariance matrix of is non-singular, thus we set . When the quantization resolution is such that is sufficiently large, the quantization noise introduced by the ADC becomes negligible, and the output of the system can be written as

 ^s ≈BoAox ≈~ΓVAΛTA(ΛAΛTA)−1ΛAVTAΣ−1/2xx. (10)

Furthermore, for large , the parameter becomes . Thus, the diagonal entries in (9a) become . By writing the singular value decomposition (SVD) in (10), and recalling that , we have

 ^s ≈U~ΓΛ~ΓΛTA(ΛAΛTA)−1ΛAVT~ΓΣ−1/2xx (a)=U~ΓΛ~ΓVT~ΓΣ−1/2xx(a)=Γx=~s,

where follows since for this setting of , , and follows since . Consequently, for sufficiently large quantization resolution, approaches the MMSE estimate .

### Iv-C Suboptimal Quantization Systems

In the previous subsection we characterized the optimal hardware-limited task-based quantization system. In the following we study two suboptimal systems of interest - a system which does not carry out any processing in the analog domain, and a system which mimics the optimal vector task-based quantizer by quantizing the MMSE estimate. Our results in the following are based on the characterization of the achievable MSE for a fixed analog combining matrix in Lemma 1.

We begin with the suboptimal case where processing is carried out only in the digital domain. Here, , and the analog combiner is given by . This structure accommodates the majority of systems studied in the literature in the context of tasks performed with low precision ADCs, e.g., [12, 15, 17, 13, 14, 18, 16]. The optimal digital processing matrix for this case and the resulting MSE are stated in the following corollary:

###### Corollary 2.

When the analog combiner is , the minimal achievable MSE is given by

 E{∥~s−^s∥2}=Tr(~ΓT~Γ(In+3~M2n2κnσ2x,maxΣx)−1), (11a) and the corresponding optimal digital matrix is Bo(In)=ΓΣx(Σx+2κnσ2x,max3~M2nIn)−1, (11b) where σ2x,max≜maxi=1,…,n((Σx)i,i).
###### Proof:

The corollary follows directly from Lemma 1. In particular, (11b) is obtained from the optimal digital processing matrix in Lemma 1 by setting , and (11a) is obtained from the resulting MSE via the matrix inversion lemma. ∎

The resulting suboptimal system bears some similarity to the task-ignorant system discussed in Section III in the sense that quantization is carried out independently of the task. However, it should be noted that the system discussed in Section III performs joint vector quantization, while (11a) is achievable with a serial ADC. As a result, the system considered here can operate only when , otherwise the scalar quantizers are assigned zero bits, while the task-ignorant system of Section III can operate with any positive value of .

Next, we consider a system in which the analog combining is designed to recover the MMSE estimate . Here, , and . As noted in the discussion following Theorem 1, this approach is suboptimal when working with serial scalar ADCs, unlike the case with vector quantizers discussed in Section III. The optimal digital processing matrix for this setup and the resulting MSE are stated in the following corollary:

###### Corollary 3.

When the analog combiner is , the minimal achievable MSE is given by

 E{∥~s−^s∥2}=Tr(~ΓT~Γ(In+3~M2k2κkσ2~s,max~ΓT~Γ)−1), (12a) and the corresponding optimal digital matrix is Bo(Γ)=~Γ~ΓT(~Γ~ΓT+2κkσ2~s,max3~M2kIk)−1, (12b) where σ2~s,max≜maxi=1,…,k(E{(~s)2i}).
###### Proof:

The corollary follows directly from Lemma 1 using the same arguments as in the proof of Corollary 2. ∎

The approach of quantizing the MMSE estimate is in general suboptimal. When the entries of are not linearly dependent, namely, the covariance matrix of is non-singular [45, Ch. 8.1], designing the analog combiner to recover the MMSE estimate is optimal if and only if the conditions stated in the following corollary is satisfied:

###### Corollary 4.

When the covariance matrix of is non-singular, quantizing the MMSE estimate is optimal if and only if the covariance matrix of is .

Proof: See Appendix -E.

Corollary 4 indicates that, except for very specific statistical relationships between and , quantizing the entries of the MMSE estimate vector is purely suboptimal. In the numerical study presented in Section VI we numerically evaluate the achievable MSE of the considered systems, and illustrate that both the system proposed in Theorem 1 and the suboptimal system discussed in Corollary 4 are able to approach the performance of the optimal vector quantizer for large number of quantization levels , and that the optimal system of Theorem 1 outperforms the suboptimal system in Corollary 4 for all considered values of . Additionally, we illustrate that for large and relatively small , a notable gap in MSE is observed between the optimal hardware-limited task-based quantizer of Theorem 1 and the suboptimal system of Corollary 4.

## V Guidelines for Hardware-Limited Task-Based Quantization for Arbitrary Models

### V-a Design Guidelines

In the previous section we characterized the distortion of hardware-limited task-based quantizers when the MMSE estimator of the desired vector from the observations vector , denoted , is a linear function of . To that aim, we designed the analog combining and the digital mapping such that, if the quantization error induced by the serial scalar ADC is negligible, the resulting output approaches the MMSE estimate. Since in Section IV the MMSE estimate is linear, one can design linear and such that the resulting quantized representation approaches as increases for any value of , which denotes the dimensions of the output of the analog linear mapping. In particular, it was noted that when decreases down to the rank of the covariance matrix of , the performance of the quantization system improves. This improvement follows as more bits can be assigned to the ADC, thus reducing the error induced by scalar quantization without modifying the overall number of bits used by the system, .

The principles used for designing the linear analog mapping and the digital mapping for the case when the MMSE estimate is linear give rise to guidelines for designing hardware-limited task-based quantization systems for arbitrary relationships between and . In particular, we propose to set and according to the following guidelines:

1. The mappings are such that when is large enough, approaches the MMSE estimate .

2. The size of the output of the analog linear mapping, , is as small as possible.

The first guideline implies that when the quantization error induced by the ADC is sufficiently small, the output of the system approaches the MMSE estimate. The second guideline guarantees that more bits are assigned to the serial scalar ADC, thus reducing the quantization error.

We note that in some scenarios, it may not be possible to obtain or approximate the MMSE estimate, , from a linear function of of reduced dimensions. For example, when is estimated from the second-order statistical moments of , as in eigen-spectrum estimation [5], subspace learning [16], DOA estimation [18, 19], and source localization [17]. In such cases, generally cannot be obtained from a linear function of of reduced dimensions. Nonetheless, the proposed guidelines can still be applied to design the quantization system. As an illustrative example, in the following subsection we explicitly show how these guidelines can be used for recovering the empirical covariance of an input signal.

### V-B Example: Recovery from Empirical Covariance

We next demonstrate how the guidelines for designing hardware-limited task-based quantization systems discussed in Subsection V-A can be applied for recovering the empirical covariance of the input. Unlike the results presented in Section IV, here we will not be able to explicitly characterize the resulting distortion. However, in the numerical study carried out in Section VI we empirically illustrate the benefits of the proposed design, and show that it outperforms processing the observations only in the digital domain, which is the more popular approach in the literature.

In particular, consider the case where the observed vector consists of zero-mean i.i.d. vectors , i.e., and . The desired vector (or its MMSE estimate ) can be recovered from the empirical covariance of , namely, from

 Rx≜1nxnx∑i=1xixTi.

For example, when is the eigenspectrum of , the MMSE estimate is obtained from via [5, Eq. (22)]. At first glance, the proposed guidelines cannot be used here, as, in general, for there exists no linear transformation such that can be recovered from . However, an approximation of can be obtained via the following steps:

• Divide the set into distinct sets, each consisting of vectors, namely, the th set is given by , .

• Fix the analog combining such that the input to the serial scalar ADC consists of vectors , where . This is achieved by setting , where the entries of are given by

 (A)(p1−1)mx+q1,(p2−1)mx+q1=δq1,q2ms∑l=1δ(p1−1)ms+l,p2,

for , , .

• In the digital domain, we approximate from the quantized vectors via

 ^Rx≜1nxns∑l=1¯zl¯zTl.

The rationale behind these steps is that, when the quantization error is negligible, and the number of sets is sufficiently large,

approaches the true covariance matrix by the law of large numbers. Note that as

decreases, the less scalar quantizers are needed, thus the quantization error induced by the serial scalar ADC is expected to decrease, while as increases up to , approaches the true covariance of (up to a constant factor). We therefore expect to have an optimal value of in the range for each value of , as also illustrated in the empirical study in Subsection VI-B.

The benefits of the proposed approach are illustrated in the numerical study presented in the following section. In particular, in Subsection VI-B it is illustrated that for the problem of eigenspectrum recovery, a quantization system designed according to the above guidelines outperforms a system which performs no analog combining prior to quantization, and that the performance gap depends on the overall quantization levels and on the number of sets .

## Vi Applications and Numerical Study

In this section we study the application of the hardware-limited task-based quantization systems proposed in Sections IV-V, in two scenarios involving parameter acquisition from quantized measurements: First, in Subsection VI-A, we study the achievable MSE in estimating a scalar channel with finite intersymbol interference (ISI) from a fixed number of quantized measurements, as in, e.g., [12, 13, 14], using the hardware-limited task-based quantizer proposed in Section IV. Then, in Subsection VI-B, we consider the problem of estimating the eigen-spectrum from a set of i.i.d. measurements, see, e.g., [5], and evaluate the achievable distortion of the quantization system design detailed in Section V.

### Vi-a Isi Channel Estimation

We first consider the estimation of a scalar ISI channel from quantized observations, as in [13, 12, 14]. In this scenario, the parameter vector represents the coefficients of a multipath channel with taps. The channel is estimated from a set of noisy observations