Let denote the Euclidean norm on introduced by an inner product and the distance between a point and a set in is defined by .
For , let denote the set of all probability measures on with a finite -moment. Let be an
-valued random variable defined on a probability space. The (quadratic) quantization procedure of (or of ) at level consists in finding a discrete approximate grid such that its quantization error achieves the optimal quantization error (or written ) for the distribution at level , defined as follows,
If , we call an optimal grid (or called an optimal cluster center) of (or of ) at level (2)(2)(2)In many references, the quantization grid at level is defined by a set of points with its cardinality card() and the quadratic quantization error function is defined by . However, for every with , one can always find a -tuple (by repeating some elements in ) such that . For example, if with (the are pointwise distinct), one may set or among many other possibilities. In [Theorem 4.12], the authors have proved that if the cardinality of the support of , an optimal grid at quantization level satisfies . Hence, . Therefore, in this paper, with a slight abuse of notation, we will mostly use but also use (in Section 1.1) with to represent a quantization grid at level . . We denote by the set of all optimal quantization grids at level of .
The distortion function is often used to describe the quantization error at a grid , defined as follows,
Definition 1.1 (Distortion function).
Let be the quantization level. Let be an -valued random variable and let denote its probability distribution. We assume that and , the (quadratic) distortion function of at level is defined on by,
It is clear that for any grid , . Hence, if , . Sometimes we withdraw the subscript of if the quantization level is fixed in the context.
Let denote the set of all probability measures on with marginals and . For , the Wasserstein distance on is defined by
equipped with Wasserstein distance is a separable and complete space (see ). If , then for any ,
The target measure for the optimal quantization is sometimes unknown. In this case, in order to obtain the optimal grid of , we will implement the optimal quantization to a known distribution sequence which converges (in the Wasserstein distance) to and search the limiting point of optimal grids of . For , let denote the optimal grid of . The consistency of , i.e. , has been proved by D. Pollard in [see Theorem 9]. Therefore, a further question is, at which rate the optimal grid of converge to an optimal grid of ?
In the literature, there are two perspectives to study the convergence rate of optimal grids:
The convergence rate of ;
The convergence rate of the distorting function of valued at : .
The latter quantity is also called the “performance” at since this value describes how close between the optimal quantization error of and the quantization error of , considered as a quantization grid for (even is obviously not “optimal” for ).
A typical example of what is described above is the quantization of the empirical measure. Let be i.i.d -valued observations of with a unknown probability distribution , then the empirical measure is defined by:
where denotes the Dirac mass at . The convergence of empirical measure and have been proved in many reference, for example [Theorem 7] and [Theorem 1] so that we have the consistency for the optimal grids of . Moreover, most references of the convergence rate result for the optimal grids are concerning the empirical measure as far as we know: A first example is . In this paper, the author has proved that if denotes the unique limiting point of , the convergence rate (convergence in law) of is . For the second perspective, it is proved in a recent work that if has a support contained in , where denotes the ball in centered at with radius , then .
In this paper, we will generalise these two precedent works:
In Section 2, we will study the general case, that is, the convergence rate of and the performance for any probability distribution sequence which converges in Wasserstein distance to . We obtain that, if and the Hessian matrix of distortion function is positive definite at all points , then for large enough,
where and are both bounded by a constant only depending on . If , we also establish a non-asymptotic upper bound for the performance: for every , there exist a constant depending on and a constant depending on , such that
under the condition that for some and .
where and is the maximum radius of -optimal grids, defined by
Especially, we will give a precise upper bound for , the multidimensionnal normal distribution
where and . If , .
We will start our discussion with a brief review on the properties of optimal grid and the distortion function.
1.1 Properties of optimal grid and the distortion function
Let be an -valued random variable with probability distribution such that and . Let denote the set of all optimal quantization grids at level of and let denote the optimal quantization error of defined in (1). The properties below recall some classical background on optimal quantization of probability measure.
Let . Let and .
(Decreasing of ) .
(Existence and boundedness of optimal grids) is a nonempty compact set so that defined in (6) is finite for any fixed . Moreover, if is an optimal grid of , then . In particular, if , then and vice versa.
If has a compact support and if the norm on is Euclidean, drived by an inner product , then all the optimal grids are contained in the closure of convex hull of , denoted by .
(Non-asymptotic Zador’s theorem) Let . If , then for every quantization level , there exists a constant which depends only on and such that
where for , .
For the proof of non-asymptotic Zador’s theorem, we refer to  and [see Theorem 5.2]. When has an unbounded support, we know from  that . The same paper also gives an asymptotic upper bound of when has a polynomial tail or hyper-exponential tail. We first give the definitions of different tails of probability measure,
Let be absolutely continuous with respect to Lebesgue measure on and let denote its density function.
A distribution has a -th radial-controlled tail if there exists and a function such that
A distribution has a -th polynomial tail if there exists and such that .
A distribution has a -hyper-exponential tail if there exists and such that .
The purpose of the definition of radial-controlled tail is to control the convergence rate of the density function to 0 when converges in every direction to infinity. Remark that the -th polynomial tail with and the hyper-exponential tail are sufficient conditions to -th radial-controlled tail. A typical example of hyper-exponential tail is the multidimensional normal distribution .
([see Theorem 1.2]) Assume that
Polynomial tail. For , if has a -th polynomial tail with , then
Hyper-exponential tail. If has a -hyper-exponential tail, then
Furthermore, if , .
Quantization theory has a close connection with Voronoï partitions. Let be a grid at level and let be any norm on . The Voronoï cell (or Voronoï region) generated by is defined by
and is called the Voronoï diagram of , which is a locally finite covering of . A Borel partition is called a Voronoï partition of induced by if
We also define the open Voronoï cell generated by by
Since we discuss mostly the Euclidean norm on , we know from [Proposition 1.3] that , where denotes the interior of a set . Moreover, if we denote by the Lebesgue measure on , we have , where denotes the boundary of (see [Theorem 1.5]). If and is an optimal grid of , even if is not absolutely continuous with the respect of , we have for all (see [Theorem 4.2]).
For any -tuple such that , one can rewrite the distortion function with the definition of Voronoï partition as follows,
For , if we denote by the distortion function of and the distortion function of . Then, for every ,
by a simple application of the triangle inequality for the norm (see  Formula (4.4) and Lemma 3.4). Hence, if is a sequence in converging for the -distance to , then for every
We can also define the quantization error function (resp. the distortion function ) for any order as follows,
For and for every , we have the similar inequality as (16):
Let such that . For a fixed quantization level , the consistency of optimal grids is firstly established by D. Pollard by using
to represent a quantization “grid” at level and is called “optimal” for a probability mesure if . We will annonce differently the consistency theorem by letting to represent the optimal grid of (of course we still call the theorem “Pollard’s Theorem”) and we will give the proof of Pollard’s Theorem with this representation to Annex B.
Theorem (Pollard’s Theorem).
Let be the quantization level. Let such that . Assume , for . For , let be a -optimal grid for , then the grid sequence is bounded in and any limiting point of , denoted by , is an optimal grid of .
2 General case
2.1 Convergence rate of optimal grid sequence
Let such that as . Fix a quantization level through this section. For every , let which is, after Proposition 1.2 - (ii), an optimal quantization grid of at level .
Recall that a probability distribution has a -th radial-controlled tail (Definition 1.3) if and there exists a function such that
Under the radial-controlled tail assumption, the convergence rate of optimal grids and its performance can be bounded by the convergence rate of probability sequence in the Wasserstein distance multiplied by a constant, as described in the following theorem.
Let be the quantization level. Let with for all . Assume that . For , let be an optimal quantization grid of .
If , suppose that
has a -th radial-controlled tail,
For any , the Hessian matrix of valued at , denoted by is a positive definite matrix.
denotes the smallest eigenvalue of all matrices, . Then for large enough,
where and .
Non-asymptotic upper bound for the performance. If , suppose that for some such that . Then for any
where is a constant depending on and depends on .
The proof of Theorem 2.1 relies on the following lemma.
Let be absolutely continuous with the respect to Lebesgue measure on . If has a -th radial-controlled tail, then every element of the Hessian matrix of the distortion function is a continuous function. As a consequence, if the Hessian matrix is positive definite at some point , then is positive definite in the neighbourhood of .
The proof of Lemma 2.2 is in Appendix C.
Proof of Theorem 2.1.
(a) Since the quantization level is fixed throughout the proof, we will drop the subscripts and of the distortion function and we will denote by (respectively, the distortion function of (resp. ).
After Pollard’s theorem in Section 1.1, is bounded and any limiting point of is in . We may assume that, up to a subsequence of , still denoted by , we have . Hence .
It follows from (15) that is differentiable at . Hence, the Taylor expansion of at reads:
where denotes the Hessian matrix of , lies in the geometric segment , and for a matrix and a vecteur , stands for .
As and , one has by applying Fermat’s theorem on stationary point. Hence
Since , , it follows that