Multiplicative deconvolution estimator based on a ridge approach

08/03/2021 ∙ by Sergio Brenner Miguel, et al. ∙ 0

We study the non-parametric estimation of an unknown density f with support on R+ based on an i.i.d. sample with multiplicative measurement errors. The proposed fully-data driven procedure consists of the estimation of the Mellin transform of the density f and a regularisation of the inverse of the Mellin transform by a ridge approach. The upcoming bias-variance trade-off is dealt with by a data-driven choice of the ridge parameter. In order to discuss the bias term, we consider the Mellin-Sobolev spaces which characterise the regularity of the unknown density f through the decay of its Mellin transform. Additionally, we show minimax-optimality over Mellin-Sobolev spaces of the ridge density estimator.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In this paper we are interested in estimating the unknown density

of a positive random variable

given independent and identically distributed (i.i.d.) copies of , where and are independent of each other and has a known density . In this setting the density of is given by

Here denotes multiplicative convolution. The estimation of using an i.i.d. sample from is thus an inverse problem called multiplicative deconvolution.
This particular model was studied by Brenner Miguel et al. (2021). Inspired by the work Belomestny and Goldenshluger (2020), the authors of Brenner Miguel et al. (2021) introduced an estimator based on the estimation of the Mellin transform of the unknown density and a spectral cut-off regularisation of the inverse of the Mellin transform. In Belomestny and Goldenshluger (2020)

a pointwise kernel density estimator was proposed and investigated, while the authors of

Brenner Miguel et al. (2021) studied the global risk of the density estimation. For the model of multiplicative measurement, the multivariate case of global density estimaton, respectively the univariate case of global survival function estimation, was considered by Brenner Miguel (2021) , respectively Brenner Miguel and Phandoidaen (2021), based on a spectral cut-off approach.
In this work, we will borrow the ridge approach from the additive deconvolution literature, for instance used by Hall and Meister (2007) and Meister (2009), to build a new density estimator and compare it with the spectral cut-off estimator proposed by Brenner Miguel et al. (2021). The contribution of this work to the existing literature is the inclusion of the ridge approach and the comparison to the spectral cut-off approach. We discuss in which situations the corresponding estimators are comparable, respectively when the ridge approach is favourable. Furthermore, the ridge approach can be used for furture works considering oscillatory error densities or unknown error densities, compare Hall and Meister (2007) and Meister (2009).

1.1 Related works

The model of multiplicative measurement errors was motivated in the work of Belomestny and Goldenshluger (2020) as a generalisation of several models, for instance the multiplicative censoring model or the stochastic volatility model.
Vardi (1989) and Vardi and Zhang (1992) introduce and analyse intensively multiplicative censoring, which corresponds to the particular multiplicative deconvolution problem with multiplicative error uniformly distributed on . This model is often applied in survival analysis as explained and motivated in van Es et al. (2000)

. The estimation of the cumulative distribution function of

is discussed in Vardi and Zhang (1992) and Asgharian and Wolfson (2005). Series expansion methods are studied in Andersen and Hansen (2001) treating the model as an inverse problem. The density estimation in a multiplicative censoring model is considered in Brunel et al. (2016) using a kernel estimator and a convolution power kernel estimator. Assuming an uniform error distribution on an interval for , Comte and Dion (2016) analyse a projection density estimator with respect to the Laguerre basis. Belomestny et al. (2016)

investigate a beta-distributed error

.
In the work of Belomestny and Goldenshluger (2020), the authors used the Mellin transform to construct a kernel estimator for the pointwise density estimation. Moreover, they point out that the following widely used naive approach is a special case of their estimation strategy. Transforming the data by applying the logarithm to the model writes as . In other words, multiplicative convolution becomes convolution for the -transformed data. As a consequence, the density of is eventually estimated employing usual strategies for non-parametric deconvolution problems (see for example Meister (2009)) and then transformed back to an estimator of . However, it is difficult to interpret regularity conditions on the density of . Furthermore, the analysis of the global risk of an estimator using this naive approach is challenging as Comte and Dion (2016) pointed out.

1.2 Organisation

The paper is organised as follows. In Section 1 we recapitulate the definition of the Mellin transform and collect its frequently used properties. To be able to compare our estimator with the spectral cut-off estimator proposed by Brenner Miguel et al. (2021), we will revisit its construction and state the necessary assumption for the estimator to be well-defined and present the ridge density estimator. In Section 2 we will show that the ridge density estimator is minimax-optimal over the Mellin-Sobolev spaces, by stating an upper bound and using the lower bound result given in Brenner Miguel (2021). A data-driven procedure based on a Goldenshluger-Lepski method is described and analysed in Section 3. Finally, results of a simulation study are reported in section 4 which visualize the reasonable finite sample performance of our estimators. The proofs of Section 2 and Section 3 are postponed to the Appendix.

1.3 The spectral cut-off and ridge estimator

We define for any weight function the corresponding weighted norm by for a measurable, complex-valued function . Denote by the set of all measurable, complex-valued functions with finite -norm and by for the corresponding weighted scalar product. Similarly, define and In the introduction we already mentioned that the density of can be written as the multiplicative convolution of the densities and . We will now define this convolution in a more general setting. Let . For two functions , where we use the notation for the weight function , we define the multiplicative convolution of and by

(1)

In fact, one can show that the function is well-defined, and , compare Brenner Miguel (2021). It is worth pointing out, that the definition of the multiplicative convolution in equation 1 is independent of the model parameter . We also know for densities that . If additionally then .

Mellin transform

We will now define the Mellin transform for functions and present the convolution theorem. Further properties of the Mellin transform, which will be used in the upcoming theory, are collected in A. Proof sketches of these properties can be found in Brenner Miguel et al. (2021), respectively Brenner Miguel (2021). Let . Then, we define the Mellin transform of at the development point as the function with

(2)

The key property of the Mellin transform, which makes it so appealing for the use of multiplicative deconvolution, is the so-called convolution theorem, that is, for ,

(3)

Let us now revisit the definition of the spectral cut-off estimator.

Spectral-cut off estimator

The family of spectral cut-off estimator proposed by Brenner Miguel et al. (2021), respectively Brenner Miguel (2021), is based on the estimation of the Mellin transform of and a spectral cut-off regularisation of the inverse Mellin transform. Given the sample , where for any

, an unbiased estimator of

is given by the empirical Mellin transform

if for . Exploiting the convolution theorem, eq. (3), under the assumption that we can define the unbiased estimator of for To construct an estimator of the unknown density , the authors of Brenner Miguel et al. (2021) used a spectral-cut off approach. That is, for we assume that , then we can ensure that since almost surely. Now, the spectral cut-off density estimator can be defined by

(4)

Here we used two minor assumptions on the error density , that is,

([G0])

This assumption implies that the Mellin transform of does not approach zero too fast. Although this assumption is fulfilled for a large class of error densities, we will now show that one can define an estimator for an even weaker assumption on the error density. An intense study of this estimator, including the minimax optimality and data-driven choice of the parameter , can be found in Brenner Miguel et al. (2021).

Ridge estimator

Inspired by the work of Meister (2009) and Hall and Meister (2007), let such that Then for any we define the function by

and the set Now for all holds . We define next the ridge density estimator by . In fact, it can be written explicitly for as

(5)

By the construction of the quotient in the integrand in eq. (5) is well-defined even without assumption [G0].

2 Minimax theory

In this section, we will see that an even milder assumption on the error density than [G1] is sufficient to ensure that the presented ridge estimator is consistent. We finish this Section 2, by showing that the estimator is minimax optimal over the Mellin-Sobolev ellipsoids. We denote by the expectation corresponding to the distribution of . Respectively we define and .

2.1 General consistency

Although the sequence is obviously nested, that is for all , we want to stress out that the squared bias, of , defined in eq. (5), might not tend to zero for going to infinity. For instance, one may consider the case where vanishes on an open, nonempty set and does not vanish on . A more sophisticated discussion about identifiability and consistency in the context of additive deconvolution problems can be found in the work of Meister (2009). The discussion there can be directly transfered to the case of multiplicative deconvolution problems. Based on the discussion presented in Meister (2009), we will give a minimal assumption to ensure that we can define a consistent estimator using the ridge approach. We will from now on assume, that the Mellin transform of is almost nonzero everywhere, that is,

([G-1])

Under the asumption [G-1] we can use the dominated convergence theorem to show that the bias vanishes for going to infinity. Further, it is worth stressing out that for and we have . We then get the following results whose proofs is postponed to B.

Proposition 1.

Let such that and . Then for any with we have

where and is defined in equation (5).
If additionally [G-1] holds and satisfies and for then

for

Although the assumptions on in Proposition 1 seem to be rather technical, we will see that they are fulfilled when considering more precise classes of error densities, so-called smooth error densities. Before we define this family of error densities let us shorty comment on the consistency of the presented estimator.

Remark 1 (Strong consistency).

In Proposition 1 we have seen that we can determine a set of assumptions which ensures by application of the Markov inequality, that

in probability. Here, we needed the additional assumption that

and to construct the estimator and show its properties. A less restrictive metric which can be considered would be the -metric, since for any density, holds. Further, the Mellin transform developed in is well-defined for any density . In the book of Meister (2009) they proposed an estimator of the density of a real random variable given i.i.d. copies of where and are stochastically independent. They were able to show that their estimator is strongly consistent in the -sense, that is, almost surely. Given the transformed data, , we can use the estimator for and deduce the estimator for any Then , implying that the estimator is strongly consistent in the . Although it might be tempting generalise this result for the -distance for any

, it would need an additional moment assumption on

which contradicts the idea of considering the most general case.

2.2 Noise assumption

Up to now, we have only assumed that the Mellin transform of the error density does not vanish almost everywhere, i.e. [G-1]. To develop the minimax theory for the estimator we will specify the class of considered error density through an assumption on the decay of its corresponding Mellin transform . This assumption will allow us to determine the growth of the variance term more precisely. In the context of additive deconvolution problems, compare Fan (1991)

, densities whose Fourier transform decay polynomially are called

smooth error densities. To stay in this way of speaking we say that an error density is a smooth error density if there exists such that

([G1])

This assumption on the error density was also considered in the works of Belomestny and Goldenshluger (2020), Brenner Miguel et al. (2021) and Brenner Miguel (2021). We focus on to the case where , and use the abreviation , respectively . Then under the asummption of Proposition 1 and assumption [G1] we can show that for each there exists a constant such that , which leads to the following corollary whose proof can be found in B. Here for

Corollary 1.

Let the assumptions of Proposition 1 and [G1] be fulfilled. Then for ,

If one chooses such that and then for

Considering the bound of the variance term, a choice of increasing slowly in , would imply a faster decay of the variance term. On the other hand, the opposite effect on the bias term can be observed. In fact, to balance both terms, an assumption on the decay of the Mellin transform of the unknown density is needed. In the non-parametric Statistics and in the inverse problem community this is usually done by considering regularity spaces.

2.3 The Mellin-Sobolev space

We will now introduce the so-called Mellin-Sobolev spaces, which are, for instance, considered by Brenner Miguel et al. (2021) for the case and Brenner Miguel (2021) for the multivariate case. In the work of Brenner Miguel et al. (2021) their connection to regularity properties, in terms of analytical properties, and their connection to the Fourier-Sobolev spaces are intensely studied. For and we define the Mellin-Sobolev spaces by

and their corresponding ellipsoids by . Then for and under assumption [G1] we can show that . Since is a density and to control the variance term, it is natural to consider the following subset of

Then we can state the following theorem whose proof is postponed to B.

Theorem 1 (Upper bound over ).

Let , and . Let further [G1] be fulfilled and . Then the choice leads to

A similar result was presented by the authors Brenner Miguel et al. (2021) and Brenner Miguel (2021) showing that for the spectral cut-off estimator the choice leads to the same rate of uniformly over the classes . For the case the authors of Brenner Miguel et al. (2021) presented a lower bound result, showing that in many cases the rate given in Theorem 1 is the minimax rate for the density estimation given the i.i.d. sample . For the multivariate case, the author in Brenner Miguel (2021) has generalised the proof for all . The following Theorem follows as a special case of the lower bound presented in Brenner Miguel (2021) for the dimension and its proof is thus omitted.

Theorem 2 (Lower bound over ).

Let , and assume that [G1] holds. Additionally, assume that for and that there exists constants such that

where for and for .
Then there exist constants such that for all and for any estimator of based on an i.i.d. sample ,

We want to emphasize that the additional assumption on the error densities are for technical reasons. To ensure that is well-defined, we need to addtionally assume that for the case of . If then follows from , compare Proposition 1.
In the work of Brenner Miguel et al. (2021) the authors showed that for the case of the spectral cut-off estimator , defined in eq. (4) is minimax optimal for some examples of error densities. In fact, they stressed out that for Beta-distributed , considered for instance by Belomestny et al. (2016), all assumption on are fulfilled.

3 Data-driven method

In Section 2 we determined a choice of the parameter such that the resulting ridge estimator is consistent, see Corollary 1. Setting we additionally found a choice of the parameter which makes the estimator minimax optimal over the Mellin-Sobolev ellipsoids , compare Theorem 1. We want to emphasize that the latter choice of might not be explicitly dependent on the exact unknown density but is still dependent on its regularity parameter which again is unknown.
We will now present a data-driven version of the estimator only dependent on the sample . For the data-driven choice of we will use a version of the Goldenshluger-Lepski method. That is, we will define the random functions for by

for and . Here and for any real numbers . Generally, the random function is an empiricial version of which mimics the behaviour of the variance term, compare Proposition 1. Analogously, is an empirical version of which behaves like the bias term. For we then set

(6)

Then we can show the following result where we denote by the essential supremum of a measurable function and the essential supremum of .

Theorem 3.

Let and . Assume that , and [G1] is fulfilled. Then for ,

where is a positive constant depending on and is a positive constant depending on , and .

Assuming now that the density lies in a Mellin-Sobolev ellipsoid, we can deduce directly the following corollary whose proof is thus omitted.

Corollary 2.

Let , and . Assume further that , and [G1] is fulfilled. Then for ,

where is a positive constant depending on and .

Conclusion

Let us summarise the presented results of the ridge estimator in comparison to the properties of the spectral cut-off estimator , considered by Brenner Miguel et al. (2021) and Brenner Miguel (2021). For the definition of the estimator, the spectral cut-off estimator needs the assumption [G0]. This assumption already implies the existence of a consistent version of the spectral cut-off estimator. For the definition of the ridge estimator the assumption [G0] is not necessary. Nevertheless, in order to show that there exists a consistent version of the ridge estimator, we needed assumption [G-1], which is weaker than [G0]. In this scenario, the estimator seems to be favourable if one aims to consider minimal assumptions on the error density, for instance to construct a strong consistent estimator, compare Remark 1. As soon as we are interested in developing the minimax theory of the estimators, the assumption [G1] is natural to be considered. It is worth pointing out, that [G1] implies [G0] and therefore [G-1]. Here the assumptions of Proposition 1, which are needed for the minimax optimality of both estimators, are identical to the assumptions of Brenner Miguel (2021). Thus none of the estimators seem to be more favourable in terms of minimax-optimality. Again, for the data-driven estimators and , proposed by Brenner Miguel et al. (2021), the assumptions on the error densities are identical. Here it should be mentioned that the authors Brenner Miguel et al. (2021) have proven the case . The general case for can be easily shown using the same strategies as in the proof of Theorem 3. In total, we can say that for the construction of an estimator with minimal assumption on the error density , the ridge estimator seems to be favourable, in the sense, that it requires weaker assumptions on . As soon as we consider smooth error densities, that is under assumption [G1], neither the ridge estimator nor the spectral cut-off estimator seems to be more favourable in terms of minimax-optimality and data-driven estimation.

4 Numerical study

In this section, we illustrate the behaviour of the data-driven ridge estimator presented in eq. (5) and (6) and compare it with the spectral cut-off estimator , presented in Brenner Miguel et al. (2021), where

with and . To do so, we use the following examples for the unknown density ,

  1. Beta Distribution: ,

  2. Log-Gamma Distirbution: ,

  3. Gamma Distribution: , and

  4. Log-Normal Distiribution: .

A detailed discussion of these examples in terms of the decay of their Mellin transform can be found in Brenner Miguel (2021). To visualize the behaviour of the estimator, we use the following examples of error densities ,

  1. Symmetric noise: , and

  2. Beta Distribution: .

Here it is worth pointing out that the example and fulfill [G1] with and . By minimising an integrated weighted squared error over a family of histogram densities with randomly drawn partitions and weights we select for for and for . For the case we choose and . In both cases, we have set .

Fig. 1: The estimator (top) and (bottom) is depicted for 50 Monte-Carlo simulations with sample size in the case under the error density (left) and (right) for . The true density is given by the black curve while the red curve is the point-wise empirical median of the 50 estimates.

Figure 1 shows that both estimators behave similarly. As suggested by the theory, the reconstruction of the density from the observation seems to be less difficult if the error variable is uniformly distibuted, case , than if the error variable is Beta distributed, case .

Fig. 2: The estimator (top) and (bottom) is depicted for 50 Monte-Carlo simulations with sample size in the case under the error density for (left), (middle) and (right). The true density is given by the black curve while the red curve is the point-wise empirical median of the 50 estimates.

Again we see that both estimators react analogously to varying values of the model parameter . Looking at the medians in Figure 2, for the median seems to be closer to the true density for smaller values of . For the opposite effects seems to occur. For , the case of the unweighted -distance, such effects cannot be observed. Regarding the risk, this seems natural as the weight function for is montonically decreasing, while for it is monotonically increasing.

Case Sample size Ridge Spectral Ridge Spectral

Table 1: The entries showcase the MISE (scaled by a factor of 100) obtained by Monte-Carlo simulations each with 500 iterations. We take a look at different densities and , two distinct sample sizes and for both estimators and we set .

Appendix A Preliminary

We will start by defining the Mellin transform for square-integrable functions and collect some of its major properties. Proof sketches for all the mentioned results can be found in Brenner Miguel et al. (2021), respectively Brenner Miguel (2021).

The Mellin transform

To define the Mellin transform of a square-integrable function, that is for , we make use of the definition of the Fourier-Plancherel transform. To do so, let and be its inverse. Then, as diffeomorphisms, map Lebesgue null sets on Lebesgue null sets. Thus the isomorphism is well-defined. Moreover, let be its inverse. Then for we define the Mellin transform of developed in by

where is the Plancherel-Fourier transform. Due to this definition several properties of the Mellin transform can be deduced from the well-known theory of Fourier transforms. In the case that we have

(7)

which coincides with the usual notion of Mellin transforms as considered in Paris and Kaminski (2001).

Now, due to the construction of the operator it can easily be seen that it is an isomorphism. We denote by its inverse. If additionally to