Consider the problem of conveying a real–valued parameter by means of uses of the additive white Gaussian noise (AWGN) channel,
where is the
–th component of a channel input vector,, that depends on the parameter , and that is subjected to a power constraint, , , and is the –th component of the channel output vector, . In a nutshell, our central interest, in this work, is to address the following question: how well can one estimate based on when one is allowed to optimize, not only the estimator, but also the modulator, that is, the function that maps into a channel input vector? How fast does the estimation error decay as a function of when the best modulator and estimator are used?
In principle, this problem, which is the discrete–time analogue of the classical problem of “waveform communication” (in the terminology of [10, Chap. 8]), can be viewed both from the information–theoretic and the estimation–theoretic perspectives. In the former, this is an instance of the joint source–channel coding problem (see, e.g.,  and references therein), where a single source symbol is transmitted by using the channel times, and it also falls within the framework of Shannon–Kotel’nikov mappings (see, e.g., , , ,  and references therein). From the estimation–theoretic point of view, every given modulator induces a certain parametric family of conditional densities of given , and estimation theory provides a plethora of both Bayesian and non–Bayesian lower bounds (depending on whether is a deterministic parameter or a random variable) on the estimation performance, as well as useful estimators that perform well, e.g., the maximum likelihood (ML) estimator in the non–Bayesian case, the maximum a–posteriori (MAP) estimator in the Bayesian case, and others.
A well known inherent problem in the design of non–linear modulators is the threshold effect (see, e.g., [10, Chap. 8]). The threshold effect means a sharp transition between two modes of behavior when the signal–to–noise ratio (SNR) crosses a certain critical value. For high SNR, a.k.a. the weak–noise regime, the estimation error behaves similarly as the one attained for a linear modulator, where it roughly achieves the Cramér–Rao lower bound (CRLB). But beyond a certain noise level, the estimation performance completely breaks down rather abruptly. As explained in [10, Chap. 8], for a given non–linear modulator, one can identify a certain anomaly event, or outage event, whose probability becomes (quite abruptly) appreciably large as the threshold noise level is crossed.
The literature on the problem of non–linear modulation and estimation contains plenty of results in the form of performance bounds (see, e.g., , ,  and many references therein), in which there is no distinction between weak–noise errors and anomalous errors, i.e., both types of errors weigh in the evaluation of the total mean square error (MSE). A little thought, however, suggests that it makes a lot of sense to separate the two kinds of errors because estimation in the presence of an outage event is rather meaningless. Indeed, separate treatments of the two kinds of errors appear already in [10, pp. 661–674], but not quite in a formal manner. A more systematic approach in this direction, of separating weak–noise errors from anomalous errors, appears in [7, Section IV.A], where the problem was posed in terms of designing a communication system, along with a definition of an outage event (albeit with a different motivation in mind), with the target of minimizing the MSE given that the outage event has not occurred, subject to the constraint that the outage probability would not exceed a given (small) constant. It was shown in , that the data processing lower bound is asymptotically achieved, in this setup, by a conceptually simple scheme, that first quantizes the parameter and then maps the set of quantized parameter values into a good digital channel code for the Gaussian channel. The receiver first decodes the digital message and then maps it back to the corresponding quantized parameter value. The outage event is then the error event in the digital part and then the weak–noise MSE is simply the quantization error. The missing link in this result, however, is that the data processing lower bound is not quite compatible to this setting, as it corresponds to a situation where there is no freedom to allow an outage event.
In this paper, we sharpen the approach of  in two ways: firstly, both in the lower bound and in the upper bound, we allow an outage event, whose definition is subject to optimization, depending on the communication system itself. Therefore, the lower bound and the upper bound refer to the same setting. Secondly, we somewhat refine the “resolution” in the quantitative aspect of the problem being addressed: instead of constraining the outage probability to be upper bounded by a small constant, we impose the constraint that the outage probability would not exceed a given exponential function of , that is, for some prescribed positive constant . Under such a constraint, we seek the fastest possible decay rate of the MSE (as a function of and the SNR, ), or more generally, the expectation of an arbitrary symmetric, convex function of the estimation error. More precisely, we derive an upper bound and a lower bound, which coincide in the limit of high SNR for a certain range of values of . Our proposed achievability scheme is based on quantization and modulation, as described before, but since the optimal outage event turns out to be the complement of a sphere in the space of noise vectors (due to the Gaussianity of the noise), this suggests the use of good lattice codes, whose Voronoi cells can indeed be shaped arbitrarily closely to –dimensional spheres [12, Chap. 7], for large .
The paper is organized as follows. In Section II, we define notation conventions and formalize the problem being addressed along with the main assumptions. In Section III, we provide the converse bound. In Section IV, we describe the achievability scheme, and finally, in Section V, we summarize and conclude.
II. Notation, Problem Formulation and Assumptions
Consider the following communication system. The transmitter wishes to convey to the receiver, a real–valued parameter , taking on values in a finite interval, say, without essential loss of generality, the interval . To this end, the transmitter uses the channel times, subject to a given power constraint. The receiver has to estimate from the noisy channel outputs. More precisely, given , the transmitter sends a vector , whose power is limited by , where is the maximum allowed average power per channel use. The received vector is
where is a zero–mean Gaussian noise vector with covariance matrix , being the identity matrix. The receiver implements an estimator (with designating a realization of the random vector ) of the parameter , where . For every , let designate an event defined in the space of noise vectors, , which is henceforth referred to as the outage event (or the anomalous error event) given .
Our general objective is to design a communication system, defined by a transmitter (which is a modulator), that satisfies the aforementioned power constraint, and a receiver (which is an estimator), , along with a family of outage events, , so as to minimize
subject to the constraint that
where the expectation in (3) and the probability in (4) are with respect to (w.r.t.) the randomness of the noise vector . Here, is a prescribed constant (independent of ), henceforth referred to as the outage exponent, is the complement of , and is referred to as the error cost function (ECF), which is assumed to have the following properties: (i) symmetry: , (ii) convexity, (iii) increasing monotonicity for (and hence decreasing monotonicity for ), and (iv) . Let denote the class of all families of outage events, , that satisfy (4).
For certain ECFs, like the squared error, , and more generally, for , , it is well known, from classical communication theory (see, e.g., [10, Chap. 8]), that there exist modulators, receivers and families of outage events for which (3) decays exponentially with . Since the above defined constrained optimization problem is difficult to solve in a precise closed form, we will adopt the customary information–theoretic approach of characterizing the fastest possible exponential decay rate of (3) subject to (4). More formally, for given sequences of modulators (all satisfying the power constraint), estimators , and families of outage events, (with ), let
Our purpose is to derive upper and lower bounds to the best achievable value of as functions of the outage exponent, , and the SNR, . In particular, we derive simple formulas for upper and lower bounds to the best achievable high–SNR error–cost exponent , denoted and , respectively, which are asymptotically compatible in the sense that , for a certain range of values of the outage exponent, .
To this end, we will make three additional assumptions, the first one concerns the ECF, the second is about the modulator, and third one is regarding the outage events.
For any given constant , we assume that , where is some continuous function with the property that implies , and where denotes equivalence in the exponential scale: the notation , for two positive sequences, and , means that , as . Moreover, we will assume that implies .111The function is, of course, induced by the function . For example, if , then obviously, . As a side remark, note that the function is allowed to depend on .
Denoting (with to be defined in Section III), consider the partition of the unit interval into non–overlapping sub–intervals of length . Then, it will be assumed that for all sufficiently large , the number, , of sub–intervals in which is continuous, is of the exponential order of , i.e., . Also, in each sub–interval of continuity, the integral exists and is finite.222Note that continuity alone does not guarantee that this first order variation integral is finite. A simple counter–example is Brownian motion. Henceforth, we define as the class of modulators that satisfy these conditions, in addition to the power constraint, .
If, for a certain vector , one of the components vanishes, then upon replacing this component by any non–zero number, the resulting vector remains in . In addition, it is assumed that as noise variance tends to zero, the covering radius of tends to zero as well. 333The first point is reasonable since the new vector designates “stronger noise” than the original one. The second requirement also makes sense, because as the noise becomes weaker, the non–outage events may shrink too (and in all directions on ), in order to improve the weak–noise estimation performance without violating the outage constraint.
The lower bound, , will be achieved by a conceptually simple achievability scheme, described as follows: we first uniformly quantize the parameter into a grid of () points, and then map this set of points into a good rate– lattice code of dimension . The coding rate will be used to control the trade–off between the outage exponent, , and the error–cost exponent, . The reason for the choice of lattice codes will become apparent from the derivation of the upper bound, .
A few comments are in order concerning the above problem formulation.
The assumption that takes on values in the unit interval is made largely for the sake of convenience. The extension to any other finite interval, , will be straightforward. Concerning the extension to the entire real line, the converse part may remain intact, but for the achievability part, the supremum over in (5) would have to be confined to an interval that grows with slowly enough, i.e., the interval where grows at a sub–exponential rate. Anyway, with this formulation, the entire real line is eventually covered in the limit .
Here, we adopt the minimax approach, a.k.a., the worst–case approach, of minimizing the estimation performance for the worst–case value of in . Owing to the fact that we focus on exponential error bounds, this is asymptotically equivalent to the Bayesian setting as long as the prior of is bounded away from zero and infinity. This is because given a function (that does not depend on ), is dominated by .
Since the estimation errors considered are in an exponentially small scale, it is actually only important how the function behaves in the vicinity of the origin. For most of the conceivable convex symmetric cost functions, near the origin, is proportional to for some . Thus, power cost functions are essentially as general as any convex cost function for our purposes.
While this point is well known, we nonetheless feel compelled to emphasize that the index of each component of the vector (as well as the those of and ) need not necessarily designate discrete time. More generally, should be thought of as a vector of coefficients that represent the transmitted signal as a linear combination of arbitrary orthonormal basis functions (in either discrete time or continuous time). In particular, if these basis functions are taken to be sinusoids (or complex exponentials), then the index of each component designates frequency. For example, the space of time–limited and (approximately) band–limited signals, of duration and bandwidth , are well known to be spanned by basis functions for (see, e.g., ). Thus, our setup can easily be translated into the framework of the continuous–time, band–limited AWGN channel, with being replaced by the noise spectral density , the discrete block–length will be replaced by the time duration , and all the exponents should be multiplied by the factor , owing to the substitution of by in all places (see also ).
The achievability part in [7, Section IV.A], where the outage probability was kept below a small constant , is clearly a special case of our problem formulation with . The performance of the achievability scheme in , however, was contrasted with the traditional data processing converse bound, that allows no outage at all, namely, must be an empty set for all . But this is equivalent to the special case of our framework with . This observation displays a considerable mismatch between the settings of the achievability and the converse parts in , when formalized in our framework. As a side remark, we should point out that in , the Bayesian approach was adopted, where the parameter was assumed to be a Gaussian random variable.
III. Converse Bound
Given a positive real , let be the unique solution to the equation
Let us also define
where is as defined in Assumption A.1, in Section II. Our first main result is the following.
Under the assumptions of Section II, for every sequence of modulators (), every sequence of estimators, and every sequence of families of outage events,
Before we prove Theorem 1, a brief informal discussion about the plan of the proof is in order. Consider first the relatively simple case where the modulator, , is continuous across the entire unit interval, (the idea can then be extended also to the case where there are discontinuities). As exhausts the unit interval, draws a curve inside the sphere, . We refer to this curve as the signal locus associated with the modulator . The length of the signal locus is given by
which is an integral that exists according to Assumption A.2. If, in addition, is differentiable for all , we may write
where is the vector of derivatives of the components of w.r.t. . Our proof plan is as follows: we first derive a lower bound on the estimation performance, , which is a monotonically non–increasing function of . Then, we derive an upper bound, , on , that must apply (at least in the exponential scale) to every in the high SNR regime. This is carried out using certain volume considerations, a.k.a. “tube packing” (in analogy to sphere packing in channel coding), which will also imply that for all , the best choice of the outage event, , would be the complement of the sphere, centered at the origin, whose radius is exactly large enough to satisfy the outage constraint. Finally, upon substituting by in the lower bound to the estimation performance, we arrive at our ultimate lower bound that no longer depends on the specific modulator, . The proof of Theorem 1 is then completed by assessing the exponential rate of the lower bound in the high SNR regime.
We should point that generally speaking, the derivations in [10, Chap. 8] are guided by essentially the same considerations, but there are some important differences. The main difference is that in , there is no really a derivation of a lower bound on the estimation performance, but a rather informal argument that concerns the weak–noise MSE of the ML estimator. Now, the weak–noise MSE of the ML estimator is inversely proportional to the energy of the signal derivative, . Since this norm is not related to the length, in a unique manner, a minimax consideration is invoked in [10, p. 620] in order to convince the reader that it is best to confine attention to modulators for which is identical to a constant (independent of ) and then and the energy of the derivative is simply , and so, they now become related in simple manner. There are at least three weaknesses in this kind of argument: (i) it assumes that is differentiable, (ii) it applies to the ML estimator, but it is not a lower bound for an arbitrary estimator, and (iii) it is not quite clear that the confinement to modulators with the property , does not harm the trade–off with the outage performance. Our derivation, on the other hand, does not suffer from these weaknesses, because, as said, the estimation performance is bounded directly in terms of the signal locus length, not in terms of the energy of the derivative, which may not even exist for a general modulator.
Proof of Theorem 1. Let , and be given and let Assumptions A.1–A.3 be satisfied. Let be a positive integer, which will be specified later. For simplicity, we assume first that is continuous along the entire interval (and describe the modifications needed when this is not the case, in short discussion in the sequel). Then, denoting the density of noise vector, , by , and the indicator function of the event , by , we have:
where the labeled inequalities are explained as follows: (a) is due to the assumed symmetry of ; (b) is by its assumed convexity; (c) follows from the union bound and by identifying the first integral of the preceding line as the error probability of the ML decision rule in distinguishing between the two equiprobable hypotheses: and , and by using the fact that this error probability is given by , where ; (d) is by the outage constraint (4); (e) is due to the convexity of the function for (which can easily be verified from its second derivative), and finally, (f) is because is monotonically decreasing and because , which in turn follows from the fact that the Euclidean norm is a metric (and so, a straight line between any two points is always shorter than any other curve connecting them).
Our next step would to derive an upper bound on . As a preparatory step toward this end, we present the following consideration. For the given , let us represent the noise vector as , where is the component of in the direction of and is the orthogonal component of . From the outage constraint (4), we have that
where we have defined . By Assumption A.3, , for every real , and so, we can continue the above chain of inequalities as follows:
Thus, for every , must satisfy the same outage constraint as , with exponent . Now, due to the Neyman–Pearson theorem and due to the Gaussianity of the noise vector, the minimum volume of , subject to the constraint, , is attained when is an –dimensional Euclidean sphere centered at the origin.444This argument follows also from one of the iso–perimetric inequalities, see, e.g., [12, Theorem 7.1.1, eq. (7.12d)]. Using the Chernoff bound (which is well–known to be exponentially tight) and the given outage constraint, the radius of this sphere is easily found to be .
We next invoke a “tube–packing” argument in the spirit of [10, pp. 672–673] (see also [4, Subsection 2.3.1], ). To comply with the outage constraint, the signal locus curve must have the property that and are disjoint whenever the and are far apart in the sense of being points that belong to different folds of the signal locus curve. This is because no point in space can possibly belong simultaneously to the non–outage regions of two remote values of the parameter. In mathematical terms, this means that in the high SNR limit,555Here the high SNR limit means that the volumes of are relatively very small: in particular, they shrink when is small, and is assumed very small compared to the radius of curvature of the signal locus curve. the volume of the body is of the exponential order of . We next derive an upper bound and a lower bound to , so that by comparing these two bounds, we can find an upper bound to .
As for the upper bound to , due to the power constraint, it cannot exceed the volume of a sphere whose radius is upper bounded by , where is the covering radius of , which in the high SNR limit, is negligible compared to (due to Assumption A.3), that is, , where is a term that tends to zero as . Thus,
where we have used the fact that the volume of an –dimensional sphere of radius is of the exponential order of (see, e.g., [12, eq. (7.30)]).
For the lower bound to , we have by the aforementioned Neyman–Pearson/iso–perimetric consideration and by the outage constraint,
Before we proceed, an important comment is now in order: the r.h.s. is of the same exponential order as the volume of the union of –dimensional spheres of radius (as opposed to the –dimensional spheres described above), centered at , where the union runs from to . This means that in the high SNR limit, the optimal choice of is the complement of a sphere of radius , independently of . This is observation will be important for the achievability part (Section IV).
Comparing the upper and the lower bounds to , we obtain
Returning to (12), and choosing for some arbitrary constant666In principle, the lower bound can be maximized over , but such a maximization will not affect the exponential order of the bound, at least not in the important case of . , we have
completing the proof of Theorem 1 for the case where is continuous.
Relaxing now the continuity assumption, and referring to Assumption A.2, we may allow discontinuities of , even at majority of the sub-intervals , provided that the number of intervals of continuity is still of the exponential order of defined above. In such a case, the modification of the above derivation would have the following ingredients: (i) the summation over in the first steps of (12) should be further lower bounded by excluding terms, indexed by , for which there are discontinuities of in the interval ; (ii) the application of the Jensen inequality to the function (step (e) in (12)) would be limited to the remaining terms; (iii) consequently, the lower bound above would be multiplied by a factor of (which will not affect the exponent, thanks to Assumption A.2), and (iv) , in the argument of the -function, would mean the sum of lengths of the continuous parts of the signal locus curve. The tube packing argument will continue to apply to the total volume of the disjoint union all tubes formed by the separate continuous parts of the signal locus curve. This completes the proof of Theorem 1.
Upon careful inspection of the proof of Theorem 1, one may find some suggestive indications what would be a good achievability scheme.
First, let us look at the last line of eq. (12): it is natural to think of the term as one that can be achieved by uniform quantization: if we quantize the unit interval uniformly and create a grid of points with equal spacings of in between, then upon quantizing an arbitrary into the nearest grid point, the absolute value of the quantization error cannot exceed , and so, the error cost will never be above . Thus, if we can come up with an achievability scheme for which the weak–noise error is exactly this quantization error, we can achieve the weak–noise error bound up to a constant factor (given by the other multiplicative terms in the last line of (12). But once we have quantized into one out of quantization points, it only remains to map each one of these points into a corresponding channel input vector, namely, to apply channel coding. The error event of this channel code will then have to be the outage event. But according to the comment that follows eq. (19), the best choice of the outage event (at least in the high SNR limit) is the complement of a sphere, independently of the transmitted code vector. It is well known that for good lattice codes in high dimension, the Voronoi cells become closer and closer to spheres [12, Chap. 7], and so, the error event (which is independent of the transmitted code vector) is roughly the event that the noise vector would fall outside a sphere. The outage exponent must then be the error exponent of such a lattice code. The coding rate would then be a free parameter that controls the trade-off between the weak–noise (quantization) error and the outage (decoding error) exponent.
More precisely, let be a positive integer, where is a design parameter to be selected later. Consider the uniform quantization of the the unit interval into non–overlapping bins, each of size , and let be the midpoint of the bin to which belongs, i.e.,
Let us define the modulator according to
where is (part of) a channel lattice code whose members all lie within the sphere of radius around the origin, so as to comply with the power constraint. At the receiver side, we first decode from the channel output, and then is estimated according to
where is the decoded version of (which is, of course, a function of only). Thus, if is decoded correctly, the (weak–noise) estimation error is simply the quantization error, whose cost is of the exponential order of
Referring to converse bound in Theorem 1 (see eq. (8)), it is suggestive that, in the quest for asymptotically optimal performance, the coding rate of the lattice code would be set to
The first term on the right–most side is obviously identified as the high–SNR approximation to the channel capacity, , and the second term is interpreted as the inevitable gap to capacity that must be suffered in order to comply with the outage constraint.
It now remains to examine the achievable error exponents of lattice codes at rate and to compare to . According to Zamir [12, Theorem 13.4.1], the random coding exponent of lattice codes (w.r.t. the MHS ensemble) is given by , where
and is the normalized volume–to–noise ratio (NVNR), a constant that depends on the lattice and the noise variance , defined as
where is the volume of the Voronoi cell of the lattice, and is the value of the noise variance such that is equal to a prescribed value of . For such a fixed , we have , but the lower bound of can be approached arbitrarily closely by some sequence of lattices [12, Theorem 7.7.1]. Now, if one enlarges those lattice cells by a factor of in each dimension (by means of reducing the rate from to ), keeping the noise variance intact, this would increase by a factor of , and then the error exponent would be , which for (or equivalently, ), gives
where the second equality is by the definition of the function (see eq. (6)). Thus, the outage constraint is met in the range where , and the converse bound is asymptotically achieved.
In [12, Theorem 13.7.1], an expurgated error exponent for lattice codes is presented, which is given by , where
The best achievable exponent is then obtained by combining these two error exponents, which results in the Poltyrev error exponent, [12, Theorem 13.7.1 and subsequent discussion].
For a general , let us define as the inverse function of . In order to comply with the outage constraint, one must have , which means a coding rate whose high–SNR approximation is given by
Thus, for a general value of , the weak–noise error cost exponent of the proposed communication system is given by