1.1 Problem formulation and background
The problem of density deconvolution can be formulated as follows. Suppose that we observe a random sample generated from the model
are i.i.d. random variables with density, and the measurement errors are i.i.d. random variables with a known distribution . Furthermore assume that are independent of
. Then the probability densityof is given by convolution
Our goal is to estimate from the observations .
An estimator of is a measurable function of observations , and the accuracy of is measured by the maximal risk
is a loss function,is a class of density functions, and is the expectation with respect to the probability measure of observations when the density of is . In this paper we will be interested in estimating at a single point and in the –norm which corresponds to the loss functions and respectively. The minimax risk is then defined by
where is taken over all possible estimators. An estimator is called rate–optimal if as , and our goal is to construct rate–optimal estimators for natural functional classes of densities.
The problem of density deconvolution has been extensively studied in the literature; see, e.g., Carroll-Hall88, Zhang90, Fan91, Butucea-T1, Lounici-Nickl, Comte-Lacour13 and Lepski-Willer19. We also refer to the book of Meister, where many additional references can be found.
Deconvolution estimators are usually constructed using Fourier transform techniques, and the majority of results in the existing literature assumes that the characteristic function of the measurement errors has no zeros on the real line. Specifically, let denote the bilateral Laplace transform of ,
with being the characteristic function of the measurement errors. The standard assumptions on in the density deconvolution problem are the following:
does not vanish, for all ;
decreases in an appropriate way as : for some
as with .
The setting under conditions (A)–(B1) is usually referred to as the case of smooth measurement error densities, while conditions (A)–(B2) correspond to the so–called super–smooth case. Under assumption (A) the achievable estimation accuracy is determined by the rate at which decreases as , and by the smoothness of the density to be estimated. In particular, it is well known that in the smooth case for the Hölder class and for the Sobolev class of regularity one has
see, e.g., Zhang90 and Fan91. The definitions of classes and are deferred to Section 5. In all what follows we will refer to the rate as the standard rate of convergence.
It is worth noting that the condition (A) is rather restrictive and excludes many settings of interest. This condition does not hold if the distribution of the measurement errors is compactly supported. For instance, if is a uniform density on then , and vanishes at , . Another typical situation in which condition (A) is violated is the case of measurement errors having discrete distributions. In general, if has zeros, the standard Fourier–transform–based estimation methods are not directly applicable. This fact raises the following natural questions.
How to construct the rate-optimal estimators in the case when the assumption (A) does not hold, that is, has zeros, and what is the best achievable rate of convergence under these circumstances?
Under which conditions on one can achieve the standard rates of convergence (2) without assuming (A)?
The existing literature contains only partial and fragmentary answers to the questions (i) and (ii). Devroye89 constructed a consistent estimator of under assumption that for almost all
. The proposed estimator is a certain modification of the standard Fourier–transform–based kernel density estimator. Hall-etal01 consider the setting with the uniform measurement error densityand develop an estimator under assumption that the density is a compactly supported. Other works dealing with the uniform density deconvolution are GroenJong03 and FeuerKim08. The first cited paper assumes that is non–negative, and shows that for a class of twice continuously differentiable densities, the pointwise risk of the proposed estimators converges to zero at the standard rate corresponding to . FeuerKim08 studied estimation densities from Sobolev functional classes with the –risk; they show that the standard rate of convergence with can be achieved in this setting provided that , the zeros of the characteristic function of have no effect on the minimax rate of convergence.
Hall-Meister and Meister-1 considered a density deconvolution problem with an oscillating Fourier transform that vanishes periodically. They proposed several modifications of the standard Fourier–transform–based estimators, considered the –risk and showed that for certain nonparametric classes of probability densities, zeros of the characteristic function do affect the rate of convergence. Delaigle-Meister demonstrated that if the density to be estimated has a finite left endpoint, then it can be estimated with the standard rate as in the case where does not have zeros. Meister-Neumann considered a setting where may have zeros, but there are two observations of the same variable with independent measurement errors. In this setting zeros of have no influence on the rate of convergence.
The existing results in the literature leave open a fundamental question about construction of the optimal density deconvolution estimators under general assumptions on Specifically, it is not clear whether and under which conditions the zeros of have no influence on the minimax rates of convergence.
The current paper addresses the aforementioned issues. First we develop a general methodology for constructing optimal density deconvolution estimators under general conditions on the measurement error distribution. These conditions cover settings with vanishing and non–vanishing characteristic functions of the measurement errors, and the proposed methodology treats all these settings in a unified way. The estimation methods we propose are based on the Laplace transform. In this sense they generalize the Fourier transform based estimation techniques used in the literature on density deconvolution. Second we derive upper bounds on the risk of the proposed estimators and provide sufficient conditions on under which the standard rate of convergence can be achieved under general assumptions on . In particular, we prove that if, in addition to the smoothness restriction or , has bounded moments of a sufficiently large order, then the standard rate of convergence can be achieved even without the assumption (A). The number of bounded moments is characterized in terms of a sequence of coefficients (zero set sequence) which, in turn, is determined by the geometry of zeros of . Third we specialize our general methodology to specific problem instances in which the zero set sequences can be explicitly calculated. Last but not least, it is also shown that the derived sufficient moment conditions are also necessary in order to guarantee the standard rate of convergence in absence of (A) for some specific problem instances.
The rest of the paper is organized as follows. In Section 2 we present a general idea for construction of proposed estimators. Section 3 introduces assumptions on the distribution of the measurement errors and presents examples of distributions satisfying these assumptions. Section 4 discusses construction of the estimator kernel and develops its infinite series representation. In Section 5 we define the estimator and present upper bounds on its risk. Settings corresponding to specific problem instances are discussed in Section 6, and lower bounds showing necessity of moment conditions are presented in Section 7. Some concluding remarks are brought in Section 8. Proofs of all theorems are given in Appendix.
For a generic locally integrable function the bilateral Laplace transform is defined by
The Laplace transform is an analytic function in the convergence region of the above integral which, in general, is a vertical strip:
The convergence region can degenerate to a vertical line , , in the complex plane. If is a probability density then the imaginary axis always belongs to , that is, , and
is the characteristic function (the Fourier transform of ). This degenerate case corresponds to distributions whose characteristic function cannot be analytically continued to a strip around the imaginary axes in the complex plane. The inverse Laplace transform is given by the formula
The uniqueness property of the bilateral Laplace transform states that if in a common strip of convergence then is equal to for almost all [Widder46, Theorem 6b].
2 General idea for estimator construction
Let be the measurement error distribution function, and let be the corresponding convergence region of its Laplace transform:
Throughout the paper we suppose that is a vertical strip in the complex plane, with and satisfying (see Assumption 1 in Section 3). As it was discussed above, if has zeros on the imaginary axis in the complex plane, then the usual Fourier–transform–based methods are not directly applicable. We will be mainly interested in this case.
2.1 Linear functional strategy
The construction of our estimators follows the so-called linear functional strategy that is frequently used for solving ill–posed inverse problems [see, e.g., goldberg1979amethod and anderssen1980ontheuse]. In the context of the density deconvolution problem the main idea of the strategy is as follows. Our aim is to find two kernels, say, and with the following properties:
integral approximates “well” the value to be estimated;
the kernel is related to the kernel via the equation:
Under conditions (i) and (ii) the obvious estimator of from the observations is an empirical estimator of the integral on the right hand side of (3),
Let be a kernel with standard properties that will be specified later. For denote . Assume that has bounded support so that is an entire function, that is, . Furthermore, assume that there exist real numbers and satisfying , such that
In words, is the union of two open strips (with the imaginary axis as the boundary), where the function does not have zeros. Therefore we can define
and this function is analytic in . Let
with Observe that the kernel is defined by the inverse Laplace transform of the function , and the denominator of the integrand in (6) does not vanish as . If the integral on the right hand side of (6) is absolutely convergent then (6) defines the same function for any value of or . In other words, depending on the sign of , equation (6) defines two different functions which will be denoted by and correspondingly. The estimator of is then defined by
The parameters and will be specified in the sequel.
2.2 Relationship between kernels and
Suppose that for any the integral on the right hand side of (6) is absolutely convergent, and
then for any
Proof : Fix . By the Fubini’s theorem
Now we show that for almost all
Applying the bilateral Laplace transform to the left hand side of the previous display formula we obtain
In view of (5), the function on the right hand side of the last display formula is analytic and equal to on . On the other hand,
Thus, the bilateral Laplace transforms of the functions on both sides of (9) coincide on ; therefore (9) holds by the uniqueness property of the bilateral Laplace transform. This implies the lemma statement.
Note that the relation (8) holds for both kernels and corresponding to and respectively. Thus, both or can be used in the estimator construction.
A naive approach towards construction of an estimator for could be based on a direct application of the Laplace transform inversion formula. In particular, (1) implies that . The empirical estimator of can be constructed in the standard way using the available data ; then a division by with a proper regularization and application of the inverse Laplace transform formula yields an estimator of . We note, however, that this estimator is well defined only under very restrictive assumptions on : should by analytic in a strip containing the imaginary axis, that is, must have very light tails. We emphasize that our construction does not require existence of for outside the imaginary axis; only the analyticity of is needed.
3 Distribution of measurement errors
Accuracy of the estimator defined in (7) will be studied under the following general assumptions on the distribution of the measurement errors.
The Laplace transform of the measurement error distribution exists in a vertical strip , , and admits the following representation:
where are positive real numbers, , are non-negative integer numbers, and the pairs , are distinct. The function is represented as
where , (and hence also ) is analytic, and does not vanish in a vertical strip with
Several remarks on Assumption 1 are in order.
Assumption 1 states that factorizes into a product of two functions. While the first function is of the form and has zeros only on the imaginary axis, the second one does not have zeros in ; the latter fact follows from analyticity of in .
In addition to Assumption 1 we require some conditions on the growth of the function in (10) on the imaginary axis. These conditions are similar to the standard conditions on in the smooth case [see condition (B1) in Section 1].
Assume there exist constants , and , such that
In addition, suppose that for some non-negative integer and
3.2 Examples of distributions
Assumptions 1 and 2 define a broad class of distributions containing densities with characteristic functions that vanish on the real line. In addition, discrete distributions are covered by Assumptions 1 and 2. All this is illustrated in the following examples.
Example 1 (Uniform distribution)
Example 2 (Convolution of uniform distributions)
Consider a convolution of the uniform distributions , with distinct parameters , each of multiplicity In this case
Therefore Assumption 1 holds with , for , , and
Thus, satisfies Assumption 2 with . Of special interest is the case of identical uniform distributions . Here , , , , and . Note also that in this case .
Example 3 (Discrete distributions)
be a discrete random variable taking values in the set, with corresponding probabilities , , where . Then
where . Let denote the roots of the polynomial ; then we have
and , where , and . In this example if all with are distinct, then , , and . It is obvious that Assumption 2 holds with .
Example 4 (Convolution of uniform and smooth density)
Let be a probability density with Laplace transform in a strip satisfying , . Assume also that for some as , that is, is a smooth density. Let be a convolution of the uniform density on with ; then
4 Kernel representation
where is the set where does not vanish. Thus, for any the denominator of the integrand in (15) is not zero. Below we demonstrate that can be formally represented as an infinite series.
4.1 Infinite series representation
To develop the infinite series representation we need the following notation. According to Assumption 1, the set of zeros of on the imaginary axis is determined by three -tuples , and
. For a given vectordefine
The set can be represented as an ordered set of real numbers , where . Define also
In fact, is the number of weak compositions of into parts [see, e.g., [Stanley, p. 25]]. Remind that an –tuple of non–negative integers with is called a weak composition of into parts.
The coefficients and of the linear combination are completely determined by the structure of the zero set of on the imaginary axis. The sequences , will play an important role in the sequel, and we call them the zero set sequences. The definitions in (18) and (20) imply that the coefficients , may grow at most polynomially in as . Note also that