Density deconvolution under general assumptions on the distribution of measurement errors

07/25/2019 ∙ by Denis Belomestny, et al. ∙ Universität Duisburg-Essen University of Haifa 0

In this paper we study the problem of density deconvolution under general assumptions on the measurement error distribution. Typically deconvolution estimators are constructed using Fourier transform techniques, and it is assumed that the characteristic function of the measurement errors does not have zeros on the real line. This assumption is rather strong and is not fulfilled in many cases of interest. In this paper we develop a methodology for constructing optimal density deconvolution estimators in the general setting that covers vanishing and non--vanishing characteristic functions of the measurement errors. We derive upper bounds on the risk of the proposed estimators and provide sufficient conditions under which zeros of the corresponding characteristic function have no effect on estimation accuracy. Moreover, we show that the derived conditions are also necessary in some specific problem instances.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Problem formulation and background

The problem of density deconvolution can be formulated as follows. Suppose that we observe a random sample generated from the model

where

are i.i.d. random variables with density

, and the measurement errors are i.i.d. random variables with a known distribution . Furthermore assume that are independent of

. Then the probability density

of is given by convolution

(1)

Our goal is to estimate from the observations .

An estimator of is a measurable function of observations , and the accuracy of is measured by the maximal risk

where

is a loss function,

is a class of density functions, and is the expectation with respect to the probability measure of observations when the density of is . In this paper we will be interested in estimating at a single point and in the –norm which corresponds to the loss functions and respectively. The minimax risk is then defined by

where is taken over all possible estimators. An estimator is called rate–optimal if as , and our goal is to construct rate–optimal estimators for natural functional classes of densities.

The problem of density deconvolution has been extensively studied in the literature; see, e.g., Carroll-Hall88, Zhang90, Fan91, Butucea-T1, Lounici-Nickl, Comte-Lacour13 and Lepski-Willer19. We also refer to the book of Meister, where many additional references can be found.

Deconvolution estimators are usually constructed using Fourier transform techniques, and the majority of results in the existing literature assumes that the characteristic function of the measurement errors has no zeros on the real line. Specifically, let denote the bilateral Laplace transform of ,

with being the characteristic function of the measurement errors. The standard assumptions on in the density deconvolution problem are the following:

  • does not vanish, for all ;

  • decreases in an appropriate way as : for some

    • as ,

    • as with .

The setting under conditions (A)–(B1) is usually referred to as the case of smooth measurement error densities, while conditions (A)–(B2) correspond to the so–called super–smooth case. Under assumption (A) the achievable estimation accuracy is determined by the rate at which decreases as , and by the smoothness of the density to be estimated. In particular, it is well known that in the smooth case for the Hölder class and for the Sobolev class of regularity one has

(2)

see, e.g., Zhang90 and Fan91. The definitions of classes and are deferred to Section 5. In all what follows we will refer to the rate as the standard rate of convergence.

It is worth noting that the condition (A) is rather restrictive and excludes many settings of interest. This condition does not hold if the distribution of the measurement errors is compactly supported. For instance, if is a uniform density on then , and vanishes at , . Another typical situation in which condition (A) is violated is the case of measurement errors having discrete distributions. In general, if has zeros, the standard Fourier–transform–based estimation methods are not directly applicable. This fact raises the following natural questions.

  • How to construct the rate-optimal estimators in the case when the assumption (A) does not hold, that is, has zeros, and what is the best achievable rate of convergence under these circumstances?

  • Under which conditions on one can achieve the standard rates of convergence (2) without assuming (A)?

The existing literature contains only partial and fragmentary answers to the questions (i) and (ii). Devroye89 constructed a consistent estimator of under assumption that for almost all

. The proposed estimator is a certain modification of the standard Fourier–transform–based kernel density estimator. Hall-etal01 consider the setting with the uniform measurement error density

and develop an estimator under assumption that the density is a compactly supported. Other works dealing with the uniform density deconvolution are GroenJong03 and FeuerKim08. The first cited paper assumes that is non–negative, and shows that for a class of twice continuously differentiable densities, the pointwise risk of the proposed estimators converges to zero at the standard rate corresponding to . FeuerKim08 studied estimation densities from Sobolev functional classes with the –risk; they show that the standard rate of convergence with can be achieved in this setting provided that

has two bounded moments. These results demonstrate that, in the problem with the uniformly distributed measurement errors and under the aforementioned assumptions on

, the zeros of the characteristic function of have no effect on the minimax rate of convergence.

Hall-Meister and Meister-1 considered a density deconvolution problem with an oscillating Fourier transform that vanishes periodically. They proposed several modifications of the standard Fourier–transform–based estimators, considered the –risk and showed that for certain nonparametric classes of probability densities, zeros of the characteristic function do affect the rate of convergence. Delaigle-Meister demonstrated that if the density to be estimated has a finite left endpoint, then it can be estimated with the standard rate as in the case where does not have zeros. Meister-Neumann considered a setting where may have zeros, but there are two observations of the same variable with independent measurement errors. In this setting zeros of have no influence on the rate of convergence.

The existing results in the literature leave open a fundamental question about construction of the optimal density deconvolution estimators under general assumptions on Specifically, it is not clear whether and under which conditions the zeros of have no influence on the minimax rates of convergence.

The current paper addresses the aforementioned issues. First we develop a general methodology for constructing optimal density deconvolution estimators under general conditions on the measurement error distribution. These conditions cover settings with vanishing and non–vanishing characteristic functions of the measurement errors, and the proposed methodology treats all these settings in a unified way. The estimation methods we propose are based on the Laplace transform. In this sense they generalize the Fourier transform based estimation techniques used in the literature on density deconvolution. Second we derive upper bounds on the risk of the proposed estimators and provide sufficient conditions on under which the standard rate of convergence can be achieved under general assumptions on . In particular, we prove that if, in addition to the smoothness restriction or , has bounded moments of a sufficiently large order, then the standard rate of convergence can be achieved even without the assumption (A). The number of bounded moments is characterized in terms of a sequence of coefficients (zero set sequence) which, in turn, is determined by the geometry of zeros of . Third we specialize our general methodology to specific problem instances in which the zero set sequences can be explicitly calculated. Last but not least, it is also shown that the derived sufficient moment conditions are also necessary in order to guarantee the standard rate of convergence in absence of (A) for some specific problem instances.

The rest of the paper is organized as follows. In Section 2 we present a general idea for construction of proposed estimators. Section 3 introduces assumptions on the distribution of the measurement errors and presents examples of distributions satisfying these assumptions. Section 4 discusses construction of the estimator kernel and develops its infinite series representation. In Section 5 we define the estimator and present upper bounds on its risk. Settings corresponding to specific problem instances are discussed in Section 6, and lower bounds showing necessity of moment conditions are presented in Section 7. Some concluding remarks are brought in Section 8. Proofs of all theorems are given in Appendix.

1.2 Notation

For a generic locally integrable function the bilateral Laplace transform is defined by

The Laplace transform is an analytic function in the convergence region of the above integral which, in general, is a vertical strip:

The convergence region can degenerate to a vertical line , , in the complex plane. If is a probability density then the imaginary axis always belongs to , that is, , and

is the characteristic function (the Fourier transform of ). This degenerate case corresponds to distributions whose characteristic function cannot be analytically continued to a strip around the imaginary axes in the complex plane. The inverse Laplace transform is given by the formula

The uniqueness property of the bilateral Laplace transform states that if in a common strip of convergence then is equal to for almost all [Widder46, Theorem 6b].

2 General idea for estimator construction

Let be the measurement error distribution function, and let be the corresponding convergence region of its Laplace transform:

Throughout the paper we suppose that is a vertical strip in the complex plane, with and satisfying (see Assumption 1 in Section 3). As it was discussed above, if has zeros on the imaginary axis in the complex plane, then the usual Fourier–transform–based methods are not directly applicable. We will be mainly interested in this case.

2.1 Linear functional strategy

The construction of our estimators follows the so-called linear functional strategy that is frequently used for solving ill–posed inverse problems [see, e.g., goldberg1979amethod and anderssen1980ontheuse]. In the context of the density deconvolution problem the main idea of the strategy is as follows. Our aim is to find two kernels, say, and with the following properties:

  • integral approximates “well” the value to be estimated;

  • the kernel is related to the kernel via the equation:

    (3)

Under conditions (i) and (ii) the obvious estimator of from the observations is an empirical estimator of the integral on the right hand side of (3),

Let be a kernel with standard properties that will be specified later. For denote . Assume that has bounded support so that is an entire function, that is, . Furthermore, assume that there exist real numbers and satisfying , such that

(4)

In words, is the union of two open strips (with the imaginary axis as the boundary), where the function does not have zeros. Therefore we can define

(5)

and this function is analytic in . Let

(6)

with Observe that the kernel is defined by the inverse Laplace transform of the function , and the denominator of the integrand in (6) does not vanish as . If the integral on the right hand side of (6) is absolutely convergent then (6) defines the same function for any value of or . In other words, depending on the sign of , equation (6) defines two different functions which will be denoted by and correspondingly. The estimator of is then defined by

(7)

The parameters and will be specified in the sequel.

2.2 Relationship between kernels and

The following lemma demonstrates that (3) holds for the kernels and given by (6).

Lemma 1

Suppose that for any the integral on the right hand side of (6) is absolutely convergent, and

then for any

(8)

Proof : Fix . By the Fubini’s theorem

Now we show that for almost all

(9)

Applying the bilateral Laplace transform to the left hand side of the previous display formula we obtain

In view of (5), the function on the right hand side of the last display formula is analytic and equal to on . On the other hand,

Thus, the bilateral Laplace transforms of the functions on both sides of (9) coincide on ; therefore (9) holds by the uniqueness property of the bilateral Laplace transform. This implies the lemma statement.    

Note that the relation (8) holds for both kernels and corresponding to and respectively. Thus, both or can be used in the estimator construction.

Remark 1

A naive approach towards construction of an estimator for could be based on a direct application of the Laplace transform inversion formula. In particular, (1) implies that . The empirical estimator of can be constructed in the standard way using the available data ; then a division by with a proper regularization and application of the inverse Laplace transform formula yields an estimator of . We note, however, that this estimator is well defined only under very restrictive assumptions on : should by analytic in a strip containing the imaginary axis, that is, must have very light tails. We emphasize that our construction does not require existence of for outside the imaginary axis; only the analyticity of is needed.

3 Distribution of measurement errors

3.1 Assumptions

Accuracy of the estimator defined in (7) will be studied under the following general assumptions on the distribution of the measurement errors.

Assumption 1

The Laplace transform of the measurement error distribution exists in a vertical strip , , and admits the following representation:

(10)

where are positive real numbers, , are non-negative integer numbers, and the pairs , are distinct. The function is represented as

(11)

where , (and hence also ) is analytic, and does not vanish in a vertical strip with

Several remarks on Assumption 1 are in order.

Remark 2

Assumption 1 states that factorizes into a product of two functions. While the first function is of the form and has zeros only on the imaginary axis, the second one does not have zeros in ; the latter fact follows from analyticity of in .

The zeros of on the imaginary axis are , , where , , and the multiplicity of each zero is equal to . Thus, Assumption 1 implies that does not vanish in , and (4) holds with , that is, and .

The form of in (11) immediately follows from (10) and the fact that . Moreover we have

In addition to Assumption 1 we require some conditions on the growth of the function in (10) on the imaginary axis. These conditions are similar to the standard conditions on in the smooth case [see condition (B1) in Section 1].

Assumption 2

Assume there exist constants , and , such that

(12)

In addition, suppose that for some non-negative integer and

(13)

The condition (12) on is rather standard in the literature; it corresponds to the so-called smooth error densities. Note however that here (12) is imposed on the function .

3.2 Examples of distributions

Assumptions 1 and 2 define a broad class of distributions containing densities with characteristic functions that vanish on the real line. In addition, discrete distributions are covered by Assumptions 1 and 2. All this is illustrated in the following examples.

Example 1 (Uniform distribution)

Let then

In this case representation (10) holds with , , , and , . Clearly, satisfies Assumption 2 with . Note that has simple zeros on the imaginary axis at , , and .

Example 2 (Convolution of uniform distributions)

Consider a convolution of the uniform distributions , with distinct parameters , each of multiplicity In this case

Therefore Assumption 1 holds with , for , , and

(14)

Thus, satisfies Assumption 2 with . Of special interest is the case of identical uniform distributions . Here , , , , and . Note also that in this case .

Example 3 (Discrete distributions)

Let

be a discrete random variable taking values in the set

, with corresponding probabilities , , where . Then

where . Let denote the roots of the polynomial ; then we have

Note that , that is, is an entire function. Representations (10) and (11) hold with

and , where , and . In this example if all with are distinct, then , , and . It is obvious that Assumption 2 holds with .

In the special case of the Bernoulli distribution with the success probability parameter

we have ; hence (10) holds with , , , , and . If is a binomial random variable with the number of trials and a success probability , then , and (10) holds with , , , , and .

Example 4 (Convolution of uniform and smooth density)

Let be a probability density with Laplace transform in a strip satisfying , . Assume also that for some as , that is, is a smooth density. Let be a convolution of the uniform density on with ; then

and (10) obviously holds with . For instance, let

is a density of the Gamma distribution with parameters

and , that is, , Then , , and .

4 Kernel representation

Under Assumption 1 kernel defined in (6) is rewritten as follows

(15)

where is the set where does not vanish. Thus, for any the denominator of the integrand in (15) is not zero. Below we demonstrate that can be formally represented as an infinite series.

4.1 Infinite series representation

To develop the infinite series representation we need the following notation. According to Assumption 1, the set of zeros of on the imaginary axis is determined by three -tuples , and

. For a given vector

define

The set can be represented as an ordered set of real numbers , where . Define also

(16)

and

In fact, is the number of weak compositions of into parts [see, e.g., [Stanley, p. 25]]. Remind that an –tuple of non–negative integers with is called a weak composition of  into parts.

Lemma 2

Let Assumption 1 hold, and .

  • If then

    (17)
    (18)

    provided that the summation on the right hand side of (17) defines a finite function for any .

  • If then

    (19)
    (20)

    provided that the summation on the right hand side of (19) is finite for any .

Remark 3

Lemma 2 shows that under Assumption 1 the kernel can be represented as an infinite linear combination of one–sided translations of , where the translation parameter takes values in the set .

The coefficients and of the linear combination are completely determined by the structure of the zero set of on the imaginary axis. The sequences , will play an important role in the sequel, and we call them the zero set sequences. The definitions in (18) and (20) imply that the coefficients , may grow at most polynomially in as . Note also that