We consider the following statistical problem. Suppose a random variable (r.v.)
and its probability density function (pdf)
, cumulative distribution function (cdf)
, and various quantiles are of interest, but only a random sample of noisy observationsare available with which to estimate the pdf, cdf, and quantiles. The underlying model is , where the ’s represent observation errors and are independent of the ’s. As is typical in the extensive literature on density estimation with noisy observations, (e.g., Carroll and Hall (1988); Stefanski (1990); Fan (1991); Diggle and Hall (1993); Delaigle and Gijbels (2004); Hall and Meister (2007); Meister (2009)), the pdf of is assumed to be known. Existing estimators for this problem have slow convergence rates and poor finite-sample accuracy. Although their asymptotic convergence rates are optimal and thus cannot be improved, in this paper we propose new estimates based on quadratic programming whose finite-sample performance improves over existing estimators substantially. We note that most of the prior work on this topic casts the problem directly in terms of pdf estimation and refers to it as density deconvolution, recognizing that estimates of the cdf and the quantiles can be obtained in the obvious manner from an estimate of the pdf. We adopt the same convention in this paper, although we are interested in cdf and quantile estimation, in addition to pdf estimation.
For the additive measurement error model, the pdf of is the convolution
This convolution in the spatial domain corresponds to multiplication in the Fourier domain, where
denotes the Fourier transform of(likewise for and ), and denotes frequency. In light of this, one classic and popular method is the Fourier-based kernel deconvolution (KD) (e.g., Carroll and Hall (1988), Stefanski and Carroll (1990), Diggle and Hall (1993)). One estimates as the inverse Fourier transform of (the overscore symbol denotes an estimate). The additional term is a frequency-domain kernel weighting function that gives less weight to higher frequency values in the Fourier inversion integral to avoid numerical conditioning problems, and
here is the bandwidth parameter for kernel smoothing. This approach is referred to as KD, because it is equivalent to kernel density estimation in the spatial domain, where the spatial domain kernel is the inverse Fourier transform of, instead of some standard (e.g., Gaussian) kernel. Thus, KD is related to kernel density estimation for data observed without error (Rosenblatt et al., 1956; Parzen, 1962; Silverman, 1986).
Although KD methods have a sound theoretical foundation with well-understood asymptotic properties, their performance is sensitive to choice of and its bandwidth parameter that dictates the amount of smoothing (Fan (1991); Barry and Diggle (1995); Delaigle and Gijbels (2004)), and it may be difficult to achieve a desirable balance between over- and under-smoothing, as illustrated in the example below. Moreover, methods having desirable asymptotic results do not necessarily perform well in typical finite sample situations. Other existing methodologies for density estimation include the spline-based smoothing method by Silverman (1984), the wavelet-based method by Pensky et al. (1999), and the wavelet-like method by Comte et al. (2006). The smoothing splines methods are similar to KD in that they correspond approximately to smoothing by a kernel method with bandwidth depending locally rather than globally on the design points. Hence, such spline-based methods can suffer from similar issues with KD methods. Wavelet-based density deconvolution methods often have advantages over traditional KD methods for pdfs that have discontinuities and sharp peaks, but they can sometimes perform poorly for smooth functions.
Fig. 1 illustrates the performance of KD methods with two types of kernels for a gamma example in which (5 is the shape parameter and 1 is the rate parameter), , and . A histogram of the observed data , along with the true density , are shown in each panel. Panel (a) also shows the KD estimate with rectangular frequency domain kernel for bandwidth parameter . The salient characteristic here is the pronounced oscillation on the tails of . This oscillation can be reduced by increasing , but the downside of this is oversmoothing of . Even the largest has not eliminated the tail oscillation, and yet the peak of is already being oversmoothed. Panel (b) shows similar results, but for triweight kernel . The same problematic tradeoff regarding the choice of bandwidth parameter is evident: If we choose a large enough bandwidth to avoid tail oscillation, this causes oversmoothing; and if we choose a small enough bandwidth to avoid oversmoothing, this causes tail oscillation. There may exist no value of bandwidth parameter that mitigates the tail oscillation without oversmoothing peaks.
Another undesirable characteristic of the KD method is that may be negative, as can be seen in Fig. 1. One can easily add a postprocessing adjustment of so that it is nonnegative and integrates-to-one, but this generally does not improve overall measures of quality of the estimator. As we demonstrate later, it is much more effective to incorporate these constraints more directly into the estimation process, as we do in our proposed estimator. Moreover, it is even more difficult to incorporate more complex shape constraints (e.g., tail monotonicity or convexity, unimodality, etc.) into the KD method. In contrast, it is straightforward to incorporate such shape constraints into our approach, when such knowledge is available, which we also demonstrate improves performance.
Motivated by the preceding, we develop a quadratic programming (QP) optimization approach for density deconvolution, together with an accompanying R package QPdecon. Specifically, our QP estimator is chosen to minimize a quadratic objective function that measures the difference between the convolution and an empirical density estimator . A variety of shape constraints are translated into linear and convex constraints and can be easily incorporated into our QP formulation. In our objective function, we also include a quadratic regularization penalty for the purpose of ensuring the most appropriate level of smoothing. In order to select the regularization parameter (analogous to the bandwidth parameter in the KD method) we develop a simple and computationally efficient method based on a concept similar to Stein’s unbiased risk estimator (SURE), which originates from Mallows (1973), Stein (1981) and Efron (1986).
Our examples indicate that, even without shape constraints, our QP estimator performs substantially better than both the classic KD method implemented with our own codes and the newer wavelet-like penalized contrast (PC) method (Comte et al. (2006)) implemented by the R package deamer (Stirnemann et al. (2012)), which is the best performing existing package we have found so far. With shape constraints (when applicable), the performance improvement is even larger. Even when the error density is Gaussian, which is notoriously difficult to deconvolve because of its smoothness (Carroll and Hall (1988); Stefanski (1990); Stefanski and Carroll (1990); Fan (1992); Wang and Wang (2011)), our QP estimator can achieve reasonable performance. The conclusion that performance can be improved when appropriate shape constraints are incorporated is consistent with findings in the large body of prior work that has incorporated shape constraints in density estimation with error-free data, e.g., Turnbull and Ghosh (2014), Zhang (1990), Dupačová (1992), Papp and Alizadeh (2014), Royset and Wets (2013), and in the limited prior work that has incorporated shape constraints in KD (Carroll et al. (2011); Birke (2009)). Although there are no proofs of asymptotic performance for the QP method, our focus is on density deconvolution in finite-sample situations, and we demonstrate that our proposed method works well via numerical studies on a variety of examples.
Optimization criteria like the quadratic objective function that we use in our QP estimator are much more amenable to incorporating shape constraints than other density deconvolution approaches. Optimization-based estimators using a regularized version of likelihood (Staudenmayer et al. (2008); Lee et al. (2013)) or least squares (Lee et al. (2015)) as the objective function were recently considered for density deconvolution, although these works did not investigate the effects of incorporating shape constraints, as we do in this work. Another difference between our work and Lee et al. (2015) is that we derive a computationally efficient SURE-like approach for selecting the most appropriate value for the regularization parameter, whereas Lee et al. (2015) used the simulation-based approach of Lee et al. (2013). We also introduce a simple graphical method that serves as a check on the selected regularization parameter, and we demonstrate that it is effective at preventing poor estimation results in the small proportion of cases where the SURE-like method selects the regularization parameter that results in too little regularization.
The remainder of the article is organized as follows. Section 2 describes our quadratic programming (QP) objective function for the density deconvolution problem (Section 2.1) and how to represent various shape constraints as linear constraints in the QP optimization (Section 2.2). Section 3 first derives the SURE-like method for selecting the regularization parameter and method of regularization (Section 3.1) and then develops the simple, yet effective graphical check on the selected value (Section 3.2). Section 4 uses simulation examples to demonstrate the superior estimation performance of the QP approach, relative to the KD and PC approaches. We also discuss the effects of incorporating shape constraints on QP estimator performance and the performance of the SURE-like approach and the graphical check for selecting the regularization parameter. Section 5 concludes the paper.
2 QP approach for density deconvolution
2.1 Basic QP Problem Formulation
In the QP approach, we work with a discretized version of the continuous convolution in Eq. (1) over a grid of equally spaced points for and , where , and . More specifically, defining , we use the discrete approximation
and similarly for , as illustrated in Fig. 2
. Let the vectorsand represent the pdfs and , respectively. As an estimate of , we will use the histogram of with bins centered at the same set of support points . That is, the estimate of is the histogram bin height at . Our discretized estimator of the pdf will also be represented as a -length vector. It should be noted that the QP approach inherently produces a smoothed estimate , so that further smoothing is unnecessary. Guidelines for selecting are discussed in Section 3.1.
The discretized version of Eq. (1) can be written as
where the elements of the convolution matrix are determined from the noise distribution, which is assumed known. At first glance, one may be tempted to use the estimate , which is an exact solution to Eq. (2) with and replaced by their estimates. However, as is well known in the deconvolution literature, is typically (for typical noise distributions) so poorly conditioned that is an unusable estimator subject to wild high-frequency oscillations.
Noting that is the solution to , this suggests using the estimator
where is a regularization term that penalizes an that is poorly behaved in some respect, and is a regularization parameter to be selected based on the data. For example, penalizing a large second derivative of can be achieved by using , where is an appropriately defined second-order difference matrix operator. We refer to this as second derivative regularization. Another option is to use where is some easily-determined and well-behaved approximation to . In our examples, we take
to be a Gaussian distribution with mean
and variance, where and are the sample mean and variance of . We refer to this as Gaussian regularization. Based on our simulation studies, the two regularization approaches performed comparably overall, with one method working better for some examples, and vice-versa for other examples. In Section 3, we present approaches for choosing the regularization method , as well as .
Because all pdfs integrate-to-one and are nonnegative, it makes sense to incorporate this knowledge into the estimation of by including constraints in the QP formulation:
where is a column vector of ones, and means that all elements of are nonnegative. Regarding quantifying the sampling variability in the QP deconvolution estimator, if desired, bootstrapping methods could be used
2.2 Additional Shape Constraints
One might have prior knowledge of various constraints on the shape of , e.g., that it is unimodal or that it has only nonnegative support. In this section, we discuss a number of such constraints that are common and that can be conveniently represented via linear constraints in the QP formulation (2.1
). Here, “linear constraint” means that a linear transformation of the vectorsatisfies some specified equality or inequality constraint and not that is constrained to be a linear function of . It is intuitively reasonable to suppose that including any such prior knowledge of shape constraints will improve the estimation, and in Section 4 we demonstrate that this is indeed the case.
Tail monotonicity. Many pdfs have nonincreasing right tails and/or nondecreasing left tails. Suppose we know that is nonincreasing for for some specified . This can be handled by incorporating additional inequality constraints into the QP formulation (2.1), as follows.
and the first non-zero column of corresponds to . A nondecreasing left tail can be handled in a similar manner, by augmenting with additional rows.
Tail convexity. Many pdfs also have one or both tails that are convex. Suppose we know that is convex for for some specified . This can be handled by adding the inequality constraints , where
and the first non-zero column of corresponds to the location of . A convex left tail can be handled similarly. We impose the convexity constraints as linear inequality constraints in the QP formulation and only for the tails of the pdf, not more general convex constraints to the entire pdf.
Unimodality. If we know the pdf is unimodal with mode at known location , this is equivalent to a nonincreasing monotonicity constraint for and a nondecreasing monotonicity constraint for . In analogy with the form of the monotonicity constraint given earlier, this can be handled by adding the inequality constraints , where
and the row of in which the order of the elements transitions from to corresponds to the mode location . The preceding is relevant when the mode location is known in advance, which generally will not be the case. For unknown mode locations, one can add as an additional decision variable and solve separate QPs, each with a different unimodality constraint corresponding to each candidate . The value of resulting in the smallest QP objective function value would be concluded the mode location.
Support constraints. If there is information on the support of , e.g., that for , this can be easily taken into account. As an example, suppose that is the concentration of a trace impurity in a chemical production process, and is a noisy measurement of that can assume negative values, even though is nonnegative. In situations like this, it is reasonable to suppose that we can improve our estimate by taking into account the information that over certain regions, even though over these regions. To handle this, supposing we know that the support of lies within the interval for some specified , one could solve (2.1) with the additional constraints that for and . In an equivalent but more computationally efficient formulation, one could simply replace the -dimensional in (2.1) by the reduced -dimensional counterpart and also replace the matrix by its counterpart comprised of columns of .
3 Parameter And Regularization Method Selection
To use the QP method, one must select the number of histogram bins to represent the empirical density , the regularization parameter , and the regularization method , which we restrict to either Gaussian regularization or second derivative regularization in this paper. For selecting both and the regularization method, we develop an approach in this section that is based on the Stein’s Unbiased Risk Estimate (SURE) method (Stein (1981)).
Regarding the choice of , we have found no adverse consequences to using a conservatively large , other than an increase in computational expense. Thus, our recommended approach is to choose a large enough that it introduces negligible smoothing-related bias in , but not so large that it unnecessarily increases computational expense. Our general rule-of-thumb that we have used in our examples is . That is, we select roughly three times the common rule-of-thumb used in regular histogram density estimation, but no greater than 200. According to Ruppert (2002), selecting a relatively large but fixed number of bins is satisfactory (which can be illustrated by Figure 5 in Yang et al. (2018)), and it is not necessary to select the number of bins by some commonly used criteria like GCV or SURE in that using such criteria occasionally causes overfitting.
) criterion that has found widespread use for parameter and model selection in many supervised learning problems (Efron (2004)). This method uses an analytical estimate of the expected test error, and, as a result, requires less computational expense than methods like cross-validation. In Section 3.2, we develop a graphical method for selecting , which can serve as either a check to avoid using an inappropriate value selected by the SURE-like method (occasionally the automated SURE-like method selects an inappropriate value for ) or as a stand-alone method (if an automated method is not needed).
3.1 A SURE Criterion for Selecting the Regularization Parameter and Method of Regularization
Let denote the histogram (viewed as a -length random vector with the bin locations treated as predetermined) of for the “training” data , and let denote the same but for some hypothetical new “test” sample of observations of drawn from the same distribution but independent of the training data. Let denote the estimate of from our QP method described in Section 2 applied to the training data with regularization parameter . We have added a subscript to to explicitly indicate its dependence on . To serve as the basis for our approach for selecting , define
as the training and test error, respectively. As in the standard SURE method, our approach is to select the value of that minimizes the expected test error , i.e.
where the expectation is with respect to both the training and the test data.
In the remainder of Section 3.1 we derive a tractable approximation for in Eq. (7), and in later sections we demonstrate that it usually provides an effective means to select . In this respect, as a criterion to select , represents a reasonable balance between tractability and meaningfulness as a measure of quality of the estimate .
The derivation of in our context follows the standard derivation used in other SURE-type approaches (Mallows (1973); Stein (1981); Efron (2004)) and begins with the well-known covariance penalty result (see the Appendix or Efron (2004)):
where denotes the cross-covariance matrix between the random vectors and . To estimate , we estimate the two terms in the right-hand-side of (8) separately. As an estimate of the expected training error, we use the observed training error in (5), which is commonly done in SURE approaches. That is, we use .
To estimate the covariance penalty term in Eq. (8), we require a closed-form expression for . For the general constrained QP formulations discussed in Section 2.2, must be solved by numerical optimization, and no closed-form expression for exists. However, for a slightly simplified QP formulation with only equality constraints,
we can apply the Lagrange multiplier method to find a closed-form solution of the form , where and are functions of . Via Lagrange multipliers, it is not difficult to derive that and in the solution to (9) are
In the preceding expressions, for second derivative regularization [i.e., for ], , and is a -length vector of zeros. For Gaussian regularization [i.e., for ], , and .
Using the preceding, the covariance penalty term in Eq. (8) can be simplified to
where denotes the covariance matrix of . To obtain an expression for , note that in our QP formulation, the density is taken to be heights of the histogram bins (scaled to represent a density). Thus, the vector follows a multinomial distribution
where is a -dimensional vector with elements denoted by . From the properties of a multinomial distribution, we know that , and for . Thus, given that , the elements of are and , i.e.,
which, for small (i.e., small multinomial probabilities , which will generally be the case if one chooses an appropriate number of histogram bins), can be approximated by
In the right hand side of Eq. (13), and depend on , and the other terms do not. To simplify notation, rewrite Eq. (13) as , where . Our SURE-like criterion (7) for selecting the optimal regularization parameter with only the integrate-to-one constraint becomes:
The preceding SURE derivation is not strictly valid if more constraints than the integrate-to-one constraint are used. With additional constraints, one might consider using (3.1) to select the best , and then reconducting the optimization (3) for that value of with all constraints included to produce the final estimate . However, we have found the following more compact modification of (3.1) to be generally more effective. Namely, our SURE-like approach for selecting the optimal regularization parameter with multiple constraints is:
The method of regularization (Gaussian or second derivative) can also be selected via SURE. This is accomplished by performing the optimization in (3.1) separately, for both regularization methods, and then choosing the method that gives the smallest SURE expected error .
3.2 A Graphical Scree-plot Approach for Selecting the Regularization Parameter
On a relatively small percentage of Monte Carlo (MC) replicates in the numerous examples that we have investigated, from (3.1) is chosen inappropriately. This is usually because is chosen too small, and the QP method results in a high-variance estimator. This is illustrated in Fig. 3 for 8,000 replicates of the same example considered in Fig. 1. For each replicate, we generated a random sample of size for the random variables and and then used the values of as the observed data. For all replicates in Fig. 3, was estimated using the QP method with only the two universal shape constraints of integrate-to-one and nonnegativity. Fig. 2(a) plots the estimation error measure against . We observe that 9.4% of replications have error more than twice the median error (the median is 0.089), and about 5.7% have error more than three times the median. The QP estimator in Fig. 2(b) corresponds to one of the occasional replicates for which is extremely underestimated, and its error is represented by the open red diamond in Fig. 2(a). In comparison, Fig. 2(c) shows a much-improved estimation result using a corrected (corrected via the scree plot method, described below) for the data from the same replicate featured in Fig. 2(b), and the error of the improved result is represented by the solid green diamond in Fig. 2(a). Notice that the error is reduced from to , a level that is far below the level for the extreme case and much more consistent with typical cases (twice the median error).
The corrected was obtained by inspection of the scree plot in Fig. 4, which is a simple graphical method that we have found to provide an effective means for selecting an appropriate regularization parameter and avoiding the poor pdf estimation that results on the occasional replicates in which the SURE-like method selects an inappropriate value for . As illustrated in Fig. 4, the scree plot is a plot of versus , and we look for the elbow in the plot. Namely, our scree-plot choice for is the smallest value of
that is comfortably to the right of the elbow. This is analogous to how a plot of the norm of the estimated ridge regression coefficient vector versus the regularization parameter is used to select the regularization parameter in ridge regression (Hoerl and Kennard (1970)).
A number of conclusions can be drawn from Fig. 2(a). First, we note that the best single value for in this example was roughly , which we found by comparing the MC average error values for a range of fixed values (the results of which are omitted, for brevity). We refer to this best single value of as the “oracle” value. The oracle value is also somewhat apparent from Fig. 2(a), because if we smooth the scatter plot, the smoothed error would be smallest at approximately . Also from Fig. 2(a), the mode of the 8,000 values produced over the 8,000 MC replicates was also , the same as the oracle value, and in this respect the SURE-like method did an overall good job of selecting .
Another conclusion from Fig. 2(a) is that on replicates for which the SURE-like method did a poor job of selecting , resulting in large error, it was always because was underestimated. Moreover, and significantly, for all of the replicates with underestimated, the scree plots (not shown here, for brevity) always looked very much like the one shown in Fig. 4, and the corrected (selected to the right of the elbow) always substantially improved the pdf estimate, as in Figure 2(c). We conclude that the scree plot provides a simple and effective means of selecting .
We now illustrate the performance of the automated SURE-like selection method and also how to select (or correct an underestimated ) using the scree plot with one replicate of the simple example, i.e , , , and .
Fig. 5 shows the histogram for the sample of observations of as well as the true and estimated pdf of (black dashed curve and red solid curve, respectively), and the estimated pdf uses only the nonnegativity and integrate-to-one constraints. The automatically selected regularization parameter for this example is , and we can see this automated selection worked quite well for this example despite having a slight oscillation on the right tail.
As an alternative to using the SURE-like method to select , or as a check that is appropriate, we can use the scree-plot method. The scree plot is constructed by repeatedly calculating the QP estimator for a set of values of , and the scree plot for the data depicted in Fig. 5 is shown in Fig. 5(a). The long green arrow in Fig. 5(a) indicates , which was obtained from the automated SURE-like method. We have added the two dashed vertical lines to indicate roughly what may be viewed as the lower and upper bounds of the candidate values suggested by the scree-plot method, and we denote any falling in this range as . The arrow to the left of indicates a value () that falls substantially below the range and is clearly to the left of the elbow. The arrow to the right of indicates a value () that is within the range. The QP pdf estimators corresponding to these two values are shown in Fig. 5(b), from which we can see that using a value that is too small results in apparent tail oscillations and over-estimation in the middle quantiles, whereas using a moderate size of within the range of smooths out the oscillations of the QP estimator without deteriorating (oversmoothing) its performance in the middle quantiles. For this particular replicate, the SURE-like method provides a regularization parameter that falls within the range of and results in good performance. However, as discussed earlier, there are replicates on which is chosen too small, and when this happens, the scree plot clearly indicates this (because falls to the left of the elbow, as in Fig. 4), so that a more appropriate can be selected to improve the performance of the QP method.
4 Discussion and Performance Comparisons
To further investigate the performance of our QP method and compare it to the other two density deconvolution methods KD and PC, we investigate two examples using the Monte Carlo simulations. The first is (mean and variance) and ; and the second is (mean and variance), and . We choose a Gaussian distribution for the noise not only because Gaussian noise is common, but also because it is considered supersmooth and notoriously difficult to deconvolve. Moreover, the smaller the signal-to-noise ratio , the more difficult is the deconvolution. For both examples, the signal-to-noise ratio is about 1.56, which is relatively small. Consequently, we use a relatively large sample size of for all numerical studies. We use various sets of constraints, which we denote by the following initials: integrate-to-one (), nonnegativity (), tail monotonicity (), tail convexity (), unimodality () and support (). We denote combinations of constraints by combinations of the initials. For convenient reference, we display the constraints we consider and their acronyms in Table 1. For all examples, the basic constraints are used for the QP approach because they apply to all pdfs. In order to have a more common basis for comparison, the KD and PC estimators are scaled retrospectively so that they integrate-to-one and are nonnegative, which we refer to as retro-. This also improved their performances overall, relative to not using the retro- adjustment. More specifically, in the retro- approach, all negative elements of are reset to zero, after which all elements of are scaled proportionately so that integrates-to-one.
|integrate-to-one||nonnegativity||tail monotonicity||tail convexity||unimodality||support|
In Section 4.1, we compare the quantile estimation performances of the QP method with only constraints versus the KD and PC methods with retro-, and in Section 4.2 we investigate the improvement in QP estimation performance that results from including shape constraints. Section 4.3 compares performances directly in terms of pdf estimation.
4.1 Performance Comparisons with Only Constraints
To distinguish estimation performance in left tail, right tail and central portions of the distribution, we consider the median absolute errors (s) for estimating the nine quantiles corresponding to probabilities for each example. We use
rather than mean square error, because the former is more robust to occasional outliers that can occur whenis underestimated by the SURE method. Downweighting such outliers is justified, because our graphical scree-plot method for choosing can effectively eliminate most of these outliers (see, e.g., Fig. 2(a)). We use the automated when selecting in the MC analysis because the scree-plot method for selecting requires user input based on inspection of the scree plots.
Recall that the KD estimator (Carroll and Hall (1988)) is:
We consider two common kernels for KD in the subsequent comparisons. One is the rectangular kernel, , and the other is the triweight kernel, . Regarding the selection of the KD bandwidth , we follow the recommendations in Delaigle and Gijbels (2004) and use the bootstrap method with their rule-of-thumb initial guess . Henceforth, we will refer to QP’s regularization parameter and KD’s bandwidth parameter both as “regularization parameters”. We will compare the performances of five estimators, and they are QP with only the constraints (denoted as QP), QP with additional constraints, KD with either rectangular kernel (KD) or triweight kernel (KD), as well as the PC estimator.
Table 2 and Table 3 show the quantile results (averaged across all MC replicates) with automated selection of the regularization parameters for the exponential and gamma examples, respectively. For the
s, the median was obtained over 500 and 8,000 MC replicates for the exponential and gamma examples, respectively. We choose the number of replicates so that the standard errors of thes are less than or equal to 1% of the value. Figures 6(a) and 6(b) provide a visual display of the relative performances of all five estimators in Tables 2 and 3, respectively. Each curve is the ratio (plotted in log-scale) across all nine probabilities for a particular estimator. The numerator of the ratio is the of the corresponding estimator, and the denominator is the geometric average of the s across all five estimators.
From Tables 2 and 3, together with Figures 6(a) and 6(b), we can see that QP outperforms the KD, KD and PC estimators for almost every quantile, often by a wide margin. The performance differences are most pronounced on the right tail for the exponential example and both the left and right tails for the gamma example. For example, at the quantile in Table 3, the for QP is about five times smaller than that for KD and six times smaller than that for KD and PC. This indicates that the undesirable tail oscillation illustrated in Fig. 1 for the KD method is substantially mitigated by the QP method. The advantage of the QP method is more obvious when adding the additional shape constraints, especially for the exponential examples. The performance of the QP estimators with additional shape constraints, i.e. QP in Table 2 and QP in Table 3, is as much as seven to eight times better than the KD and PC estimators for some quantiles. See Sec. 4.2 for details on these and other constraints for the gamma and exponential examples. The only situations for which QP did not outperform all other methods are for and for the gamma example in Table 3, where KD slightly outperformed QP. Overall, the KD and PC methods are comparable with each other, with PC being slightly better than KD for the exponential example, which has a sharper pdf, and slightly worse than KD for the relatively smooth gamma example.
To eliminate any adverse effects of selecting the regularization parameter inappropriately and focus on the inherent performances of the approaches, we repeat the preceding MC simulations, but we use the oracle regularization parameters instead of automated parameter selection. The corresponding s are listed in Tables 4 and 5. Since the PC method implemented by the R package deamer does not have a user-specified regularization parameter, the s for the PC column listed in Tables 4 and 5 are the same as those in Tables 2 and 3, although we still include it in Tables 4 and 5 for comparison purposes. The ratio comparisons are displayed in Figures 6(c) and 6(d).
The oracle for each example is determined by trying a broad range of values and choosing the one that results in the smallest aggregate measure of across all nine quantiles. The aggregate measure is , where is the over the MC simulation for the quantile corresponding to . The oracle values for KD and KD are determined similarly. Notice that the s of KD are identical in Tables 2 and 4 to three significant digits. This is because the KD bandwidth selection method in the exponential example is consistent across all MC replicates, and the selected bandwidth is quite close to the oracle . Specifically, for the exponential example, the oracle regularization parameter values for QP, QP, KD and KD are selected at 0.00995, 0.316, 0.720, 0.457, respectively. As for the gamma example, the oracle regularization parameter values for QP, QP, KD and KD are selected at 0.011, 0.016, 0.867, 0.467, respectively.
Comparing Tables 4 and 5 with Table 2 and 3, all methods perform only slightly worse in terms of performance when the regularization parameters are chosen automatically, relative to when the oracle values are used. Together with the results in Section 3.2, this implies that is reasonably selected for most replicates, and the primary drawback of the automated SURE-like method is underestimation of on a relatively small percentage of replicates (which can be easily corrected using the scree-plot method discussed in Section 3.2).
4.2 The Effects of Incorporating Shape Constraints on the QP Estimator
The performance advantages of incorporating relevant shape constraints into the pdf estimation can be gauged from Tables 2–5. In this section, we investigate this in more depth. Table 6 demonstrates the performance improvement that can be achieved by including various shape constraints in the QP method for the same gamma and exponential examples. Each number in Table 6 is the QP using the indicated constraint, divided by the QP using only the retro- constraints. The QP