Exhaustive goodness-of-fit via smoothed inference and graphics

05/26/2020
by   Sara Algeri, et al.
University of Minnesota
0

Classical tests of goodness-of-fit aim to validate the conformity of a postulated model to the data under study. Given their inferential nature, they can be considered a crucial step in confirmatory data analysis. In their standard formulation, however, they do not allow exploring how the hypothesized model deviates from the truth nor do they provide any insight into how the rejected model could be improved to better fit the data. The main goal of this work is to establish a comprehensive framework for goodness-of-fit which naturally integrates modeling, estimation, inference, and graphics. Modeling and estimation focus on a novel formulation of smooth tests that easily extends to arbitrary distributions, either continuous or discrete. Inference and adequate post-selection adjustments are performed via a specially designed smoothed bootstrap and the results are summarized via an exhaustive graphical tool called CD-plot.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

05/26/2020

Smoothed inference and graphics via LP modeling

Classical tests of goodness-of-fit aim to validate the conformity of a p...
02/24/2020

Testing Goodness of Fit of Conditional Density Models with Kernels

We propose two nonparametric statistical tests of goodness of fit for co...
10/08/2020

Conditional Goodness-of-Fit Tests for Discrete Distributions

In this paper, we address the problem of testing goodness-of-fit for dis...
05/27/2021

A review of goodness-of-fit tests for models involving functional data

A sizable amount of goodness-of-fit tests involving functional data have...
10/15/2018

Characterizations of continuous univariate probability distributions with applications to goodness-of-fit testing

By extrapolating the explicit formula of the zero-bias distribution occu...
09/01/2020

Informative Goodness-of-Fit for Multivariate Distributions

This article introduces an informative goodness-of-fit (iGOF) approach t...
04/01/2018

An overview of uniformity tests on the hypersphere

When modeling directional data, that is, unit-norm multivariate vectors,...

Code Repositories

LPsmooth

Smoothed inference and graphics by means of CD plots and deviance tests.


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Tests for goodness-of-fit such as pearson, anderson and shapiro are some of the most popular methods used to assess if a model postulated by the scientists deviates significantly from the true data distribution. Because of their inferential nature, they can be framed in the context of confirmatory data analysis but they provide little or no insight from an exploratory and/or modeling perspective. Specifically, when the postulated model is rejected, they do not equip practitioners with any insight on how the latter deviates from the truth nor do they provide indications on how the rejected model can be improved to better fit the data.

A more comprehensive approach to goodness-of-fit that naturally addresses these drawbacks is given by smooth tests. They were first introduced by neyman37 as a generalization of the Pearson test and were further extended by barton53; barton55; barton56. The main idea is to conduct a test of hypothesis where the alternative model embeds the null as a special case through a series of orthonormal functions. As a result, if the hypothesized model is rejected, the alternative model naturally allows correcting it to provide a better fit for the data. Despite the existence of smooth tests for regular distributions (e.g., rayner86; rayner88), the specification of the orthonormal functions being used is not universal and typically depends on the distribution under comparison. Furthermore, the resulting inference is strongly affected by the model selection process and often relies on asymptotic results.

The main goal of this work is to generalize smooth tests by introducing a unifying framework which (i) easily generalizes to both continuous and discrete data in arbitrarily large samples, (ii) naturally leads to an efficient sampling scheme to perform variance estimation and inference while accounting for model selection and (iii) allows us to graphically assess where a significant deviation of the hypothesized model from the truth occurs. In our strategy, the key ingredient to tackle (i) is a specially designed orthonormal basis first introduced by

LP13 and LP14 in the context of the so-called LP approach to statistical modeling111In the LP acronym, the letter L

typically denotes nonparametric methods based on quantiles, whereas

P stands for polynomials (LPksamples, Supp S1).
. To address (ii) we study and further extend a novel smoothed bootstrap which naturally applies to both continuous and discrete data and, most importantly, allows us to efficiently perform variance estimation, inference and adequate post-selection adjustments for arbitrarily large samples. To achieve (iii) we propose the so-called Comparison Density plot or CD-plot, a graphical tool for goodness-of-fit that allows us to assess simultaneously if and how significant deviations of the hypothesized from the true underlying model occur. The CD-plot was originally proposed in algeri20 in the context of signal detection of astrophysical searches involving large continuous-valued data samples. Here, we extend algeri20 to arbitrary large samples from either continuous or discrete distributions. Furthermore, we introduce a novel sampling scheme, called the bidirectional acceptance sampling algorithm, which allows us to simulate from two different distributions simultaneously and, consequently, improves the computational performance of the LP-smoothed bootstrap in our setting. Finally, we apply the methods proposed to identify the distribution of the time from the onset of symptoms to hospitalization of COVID-19 patients.

The remainder of the article is organized as follows. In Section 2, we review smooth tests and in Section 3 we reformulate them in the context of LP modeling. In Section 4 we introduce the CD-plot and the LP-smoothed bootstrap. Section 5 is dedicated to bidirectional acceptance sampling. Important extensions of the method proposed are covered in Section 6. The analysis of COVID-19 hospitalization data is presented in Section 7. A discussion is proposed in Section 8. Proofs are collected in Appendix A. R codes are freely downloadable at https://salgeri.umn.edu/my-research.

2 Background: comparison distributions and smooth tests

Smooth tests are a class of goodness-of-fit inferential methods that rely on the specification of a smooth model, of which the model hypothesized by the researcher is a special case. Several classes of smooth models have been proposed in the literature (e.g., neyman37; barton56) and involve an orthonormal expansion of the so-called comparison density (parzen83).

Definition 2.1.

Let

be a random variable, either discrete or continuous, with probability function

and cumulative distribution function (cdf)

and let be a suitable probability function, with cdf , same support of , and quantile function . The comparison density between and can then be specified as

(1)

and we assume that whenever .

In (1), is known in the literature as parametric start (hjort95) or reference distribution (morris). The comparison distribution is defined as . As noted by parzen2004, in the continuous case for all ; whereas, for discrete, is piecewise linear at values , where are probability mass points of and

(2)

It follows that, in the discrete case, the comparison density is a step function (e.g., bottom left panel in Figure 1) with values for (see morris, p.18, for more details). Throughout the manuscript, we will mainly consider the case where the parametric start is fully known; the case where is characterized by a set of unknown parameters is discussed in Section 6.

On the basis of (1), a smooth model can be specified as

(3)

where is a representation of (1) by means of a series of functions of , denoted by , which together form a complete orthonormal basis on . Table 1 summarizes possible specifications of proposed in the literature. Clearly, and , whenever , if is discrete, or when , if

is continuous. For the moment, we consider

to be chosen arbitrarily; a discussion on the choice of is postponed to Section 6.

Method
Neyman (1937)
Barton (1956)
Devroye-Györfi (1985)
Gajek (1986)
Table 1: Possible representations of . The functions form a complete orthonormal basis on , and , are normalizing constants; whereas, and

denote vectors of unknown coefficients such that

, for all .

Finally, given a set of independent and identically distributed (i.i.d.) observations from , with , for all , a smooth test is constructed by testing, for any of the models in Table 1, the hypotheses versus (or versus for neyman37

). The test statistic takes the form

(4)

and follows, asymptotically, a distribution under (see thas for a self-contained review on smooth tests). Finally, a rejection of implies that the smooth model fits the data significantly better than the hypothesized model .

3 Smooth tests via LP score functions

The smooth tests discussed in Section 2 can be applied to both continuous and discrete distributions (e.g., rayner2009, Ch. 8), however, they often require the specification of an adequate orthonormal system on a case-by-case basis (e.g., rayner2009, Ch. 9-11). The goal of this section is to exploit the novel LP approach to statistical modeling first introduced by LP14 to provide a generalized formulation of smooth tests that extends to arbitrary distributions.

LP modeling allows the unification of many of the standard results of classical statistics by expressing them in terms of quantiles and comparison densities (e.g., LPfdr; LPbayes) and provides a simple and powerful framework for data analysis (e.g., LPmode; LPtimes; LPcopula; LPksamples). This approach lays its foundations in a specially designed orthonormal basis of LP score functions

which, conversely to any other polynomial basis, can be used to express general functions of continuous or discrete random variables and thus provide a universal representation for arbitrary data distributions.

In our setting, given a random variable , either continuous or discrete, we can construct a complete orthonormal basis of LP score functions for by setting the first component to be , whereas subsequent components can be obtained by Gram–Schimidt orthonormalization of powers of

(5)

where is the set of distinct points in the support of , when and is the so-called mid-distribution function, with mean and variance given by and , respectively (parzen2004). Interestingly, when is continuous, and the LP score functions can be expressed as normalized shifted Legendre polynomials.

Figure 1: Comparison of different estimators of and when is discrete (left panels) and when is continuous (right panels). The upper left panel shows various estimators of the probability mass function (pmf) when , and the parametric start is the pmf of a random variable truncated over the range

. The bottom left panel compares the respective comparison density estimators. In the upper right panel, the true probability density function (pdf)

is that of a Normal(-15,15) random variable truncated over the range . In this case, and the parametric start is the polynomial density in (15). In this setting, (14) leads to a bona fide estimate and thus it coincides with (13). The respective comparison densities are shown in the bottom right panel. In all the cases considered, the smoothed estimators have been computed choosing .

To generalize our framework to both the continuous and discrete settings, it is particularly useful to express the quantities of interest in the quantile domain by means of the probability integral transform . Specifically, denote with the basis of LP score functions expressed in the quantile domain and with respect to the probability measure , i.e., . If ; we can rewrite the models in Table 1 in terms of score functions by setting

(6)
and satisfies the constraints (7)

where in (7) is the comparison density representation in the formulation of neyman37 (see Table 1).

From (3), it follows that smooth estimators of can be specified as

(8)

where is an estimate of the comparison density obtained by replacing suitable estimators of and/or in any of the models in Table 1. A simple strategy to estimate the coefficients is to consider their sample counterparts. Specifically, let be the empirical cdf, i.e., , where is the indicator function and denote with the empirical comparison density, i.e., . The coefficients can be estimated via

(9)
(10)

From (9), it follows that

(11)

with . Finally, estimates of the coefficients can be obtained numerically by replacing to in (7) (e.g., LPmode).

Finally, the equivalent of (4) can be specified by considering the so-called deviance statistics (LPfdr; algeri20)

(12)

which, similarly to (4) follows, asymptotically and under , a distribution.

Although any of the models in Table 1 can be used to obtain estimates of and , in this article we mainly focus on the model proposed by gajek86 and estimated via

(13)

where is chosen to guarantee that integrates/sums to 1. Specifically, is obtained by adequately correcting Barton’s estimate

(14)

to provide a bona fide estimate of , i.e., nonnegative and with integral/sum equal to one. Furthermore, let the Mean Integrated Squared Error (MISE) of an estimator of be ; the work of gajek86 showed that , whereas, the same result is not guaranteed when considering bona fide corrections in the formulation of DG in Table 1 (kaluszka).

The estimators in (10), (13) and (14) are compared using two illustrative examples in Figure 1. The left panels show the results obtained when considering observations from with parametric start (red vertical crosses) corresponding to the pmf of a random variable truncated over the range . In the upper left panel the Barton’s estimator of (purple dots) leads to nonnegative values and thus the respective Gajek’s correction in (13) is computed (green circles); both (13) and (14) are computed choosing . The two smoothed estimators of show only minor differences from one another, however, they differ substantially from the empirical mass function, i.e., (gray triangles) and provide estimates which are closer to the truth (blue crosses). To highlight the differences between the discrete and the continuous settings, the right panels show the results obtained when considering observations from a Normal(-15,15) random variable truncated over the range . Here, the parametric start (red dashed lines) corresponds to the polynomial density

(15)

and is a normalizing constant. In this case, choosing , the Barton estimator in (14) leads to a bona fide estimate (green dotted–dashed line) and coincides with (13).

An important advantage of referring to estimators of the form in (8) is that the graph of allows us to visualize where and how the true model of the data deviates from the hypothesized model . For our toy examples, the graphs of different estimators of are displayed in the bottom panels of Figure 1

. In the discrete case, the comparison density estimators considered are below one in correspondence of the most extreme quantiles, suggesting that the Poisson pmf overestimates the tails of the true underlying Binomial distribution. In the continuous setting, the comparison density estimator deviates mildly from one, suggesting that the polynomial pdf may overestimate the right tail of the truncated normal. Finally, the upper panels show how, in virtue of (

8), automatically updates in the direction of .

4 Inference and graphics via LP-smoothed bootstrap

Despite the graph of in (13) (or more broadly, of a suitable estimator of ) allows us to explore the nature of the deviation of from , it does not provide any insight on the significance of such deviations. Conversely, smooth tests based on (4), and equivalently (12), implicitly aim to test

(16)

while in practice we test for

(17)

Notice that in (16) implies in (17), the opposite is not true in general. Whereas, in (17) does imply in (16), and thus smooth tests allow us to determine if deviates significantly from . However, they do not assess where and how significant departures of from occur. Therefore, to gain more insights on this aspect, in the next section we discuss how to complement the graph of with suitable confidence bands and graphically assess the validity of in (16).

4.1 Confidence bands and CD-plot

When constructing confidence bands, we must take into account that their center and their width are determined by the bias and the variance of the comparison density estimator considered. In this section and those to follow, we focus on the estimator in (13); however, our considerations can be easily extended to any other estimator of described in Section 3 (see also Table 1).

Our estimator only accounts for the first LP score functions. Therefore, unless one were to assume that the true model is a special case of (3) (e.g., neyman37), is a biased estimator of and confidence bands constructed around can potentially be shifted away from the true comparison density . Although the bias cannot be easily quantified in a general setting, it is easy to show that, when in (16) (and consequently in (17)) is true,

is an unbiased estimator of

(e.g., algeri20)

. Hence, we can exploit this property to construct reliable confidence bands under the null hypothesis. Furthermore, the variance of

is likely to vary substantially over the range

. For instance, when dealing with moderate sample sizes, it is natural to expect its standard error to be particularly large around the tails of the distribution.

INPUTS: sample observed , parametric start , significance level ,
     number of LP score functions , number of Monte Carlo replicates .
Step 1: Estimate , via (9) on .
Step 2: Compute (12) and call it .
Step 3: For
    A. Sample from .
    B. On , compute , , via (9), (12), (13), respectively.
Step 4: For each :
    A.
    B.
    C.
Step 5: Estimate the quantile of order of the distribution of , i.e.,
      .
Step 6: Combine Step 3B and Step 4 and compute (18).
Step 7: Estimate the deviance test p-value via .
Algorithm 1 Computing confidence bands and deviance tests via Monte Carlo.

To account for the issues associated with both bias and variance, we aim to construct confidence bands of the form

(18)

where denotes the standard error of under at and, letting be the desired significance level, , is the value that satisfies

(19)

Despite tube-formulas can be used to approximate (18) in an asymptotic, continuous regime (e.g., algeri20), such approach does not apply to the discrete setting, when the sample size is not sufficiently large to rely on asymptotic approximations, or when model selection is performed (see Sections 6 and 7). Therefore, to guarantee the generalizability of our procedure to all these situations, we rely on simulation methods to estimate , , and compute (18).

Figure 2: Examples of CD-plots when is discrete (left panel) and when is continuous (right panel). Similarly to Figure 1, in the left panel the hypothesized model is the pmf of a random variable truncated over the range , whereas the data sample () is generated from a Binomial(15,0.5) random variable. The hypothesized model in the right panel is the polynomial density in (15), whereas the data sample () is generated from a Normal(-15,15) random variable truncated over the range . The Gajek estimator of the comparison density in (13) is displayed using green lines with dots corresponding to the probability integral tranform of the mass points in the discrete case. In both cases, , the gray bands correspond to the confidence bands obtained by simulating from the parametric start whereas the green bands are the standard errors obtained by simulating from Gajek’s estimator in (13).

4.2 Variance estimation via LP-smoothed bootstrap

To compute (18) via simulations, we estimate and by drawing (e.g., if set or ) Monte Carlo samples from , namely , . Similarly, an approximate p-value to test (17) can be obtained by simulating the distribution of in (12). The main steps of the simulation procedure are summarized in Algorithm 1. The left panel of Figure 2 shows the results obtained for our binomial example introduced in Section 3. This plot is an example of Comparison Density plot or CD-plot which offers the advantage of visualizing where significant departures of the data distribution from the hypothesized parametric start occur. Specifically, if is within the confidence bands (gray areas) over the entire range , we conclude that there is no evidence that deviates significantly from anywhere over the range considered. Conversely, we expect significant departures to occur in regions where lies outside the confidence bands. In our example, both the confidence bands as well as the p-values for the deviance test in (12) have been computed considering datasets simulated from the parametric start.

Despite the bands in (18) only being affected by the variance and the distribution of under , it is important to acquire a sense of the efficiency of as an estimator of in the more general scenario where . Conversely from the Barton estimator in (14), however, an explicit formula for the variance of cannot be easily derived and thus, also in this case, we must rely on simulation methods to obtain a reliable estimate of the latter.

As recognized by LPmode (see also parzen2004), (8) naturally leads to a smoothed bootstrap scheme where samples from a smooth estimator of are obtained using an accept/reject algorithm where plays the role of the instrumental distribution. Conversely from the nonparametric bootstrap (efron79), in the smoothed bootstrap (efron79) the sampling is performed from a smoothed version of , and thus it avoids producing samples with several repeated values from the original data. Here, we introduce an LP-smoothed bootstrap based on (13) and we discuss its usefulness in estimating the variance of .

Denote with the estimate of associated with (13) and let and be observations simulated from and , respectively. We accept as an observation from , i.e., we set , if

(20)

and we reject otherwise. From a practical perspective, a smoothed bootstrap scheme based on (20) allows us to sample from cells (or values) which have not been observed in the original data; therefore, it is particularly advantageous when dealing with discrete or categorized data and/or, more broadly, when the sample size is small (e.g., upper left panel of Figure 1). Moreover, expressing using LP score functions naturally provides a smoothed bootstrap scheme that automatically generalizes to both the continuous and discrete settings.

The green bands in the left panel of Figure 2 correspond to the estimated standard error of , namely , for our binomial example and obtained by simulating datasets from via (20). In this setting, the parametric start is the pmf of a Poisson random variable and thus it is straightforward to first simulate observations from it, use them to compute (18) and then accept/reject them as observations from to compute . In practical applications, however, the hypothesized model does not always enjoy a simple formulation and thus (20) must be extended further to sample efficiently from both and as described in Section 5.

4.2.1 A brief comparison with the nonparametric bootstrap

In our setting, the level of smoothing is determined by . As widely discussed by Young and co-authors (smooth1; smooth2; smooth3; smooth4), in the continuous case an adequate amount of smoothing may lead to estimators with a lower mean square error (MSE) than those obtained with the nonparametric bootstrap; however, these corrections are only up to the second order for large samples. In the discrete case, however, guerra

emphasize the advantages of the smoothing bootstrap in constructing confidence intervals but do not investigate the amount of smoothing required. Despite an extensive comparison of the LP-smoothed bootstrap and the classical nonparametric bootstrap being beyond the scope of this paper, here we briefly discuss its advantages in the estimation of linear functionals of the type

(21)

Extensions to more general functionals can be derived as in smooth1.

A quantile representation of is

(22)

with . Similarly, the estimates of obtained by means of the classical nonparametric bootstrap and the LP-smoothed bootstrap, namely and , respectively, can be specified as

(23)

where is the empirical comparison density in (10). Denote with the cdf of (14), interestingly, when is bona fide, . This aspect allows us to establish Theorem 4.1 below; the proof is provided in Appendix 1.

Theorem 4.1.

If in (14) is bona fide, and the MSE of can be lowered below that of , for any with

(24)

where , for all and .

Because (24) depends on unknown quantities, can be estimated by replacing and with consistent estimators. Furthermore, it has to be noted that the estimation of may lead to numerical issues for large , but it is feasible for . It follows that, in practice, Theorem 4.1 is only useful for sufficiently large discrete-valued samples and small . Conversely, for large continuous-valued samples, a suitable level of smoothing can be identified as in smooth3 or smooth2 and noticing that enjoys a kernel representation with bandwidth parameter proportional to (LPmode).

INPUTS: sample , parametric start , instrumental probability function .
Step 1: Obtain and in (13).
Step 2: Set and and obtain in (29).
Step 3: Sample from and from :
   a.if
   b. else
Algorithm 2 Bidirectional acceptance sampling
Remark 4.2.

Notice that Theorem 4.1 is only valid when is bona fide. When that is not the case, and the equivalent of Theorem 4.1 cannot be easily derived. However, it can be shown that the MSE of approximates that of when approaches zero, with , and thus (24) can still be used as an approximate criterion to identify suitable values of .

5 The bidirectional acceptance sampling

The bidirectional acceptance sampling extends (20) by sampling simultaneously from both and . The idea at the core of the algorithm is to consider an instrumental probability function, with cdf , from which it is easy to sample. Samples from are then accepted/rejected as samples from both and , from or only or from neither nor . The main steps of this algorithm are described below and are summarized in Algorithm 2.

In principle, given pairs of observations drawn from and , respectively, samples and/or can be obtained as in (20), i.e.,

(25)
(26)

with , , and . Interestingly, (25) and (26) can be easily combined by noticing that

(27)

where and are known exactly and play the role of auxiliary comparison densities.

Specifically, let

(28)

and define

(29)

From (28) it follows that for any and the opposite is true for . Therefore, if , we have that if and thus is accepted as a sample from both and . Conversely, if but , then is accepted as a sample from but not from . Finally, is rejected whenever . Similar reasoning applies for the case of .

Compared with (25) and (26), Algorithm 2 reduces substantially the number of evaluations of and and can increase the acceptance rates for both and G. Theorem 5.1 below summarizes these aspects and guarantees the validity of the bidirectional acceptance sampling. The proof of Theorem 5.1 is given in Appendix A.

Figure 3: Choosing a suitable instrumental density . The left panel shows how in (31) (purple dotted line) compares with respect to the densities (red dashed line) and (green dotted–dashed line) from which we aim to simulate. The right panel shows the respective comparison densities with the purple dotted line corresponding to the function being optimized in (30).
Theorem 5.1.

Let and . The bidirectional acceptance sampling enjoys the following properties:

  1. it allows us to sample from and ,

  2. its acceptance rate is for both and ,

  3. for every observations drawn from , the total number of evaluations of and is always less than or equal to and it converges almost surely to , with .

The first property in Theorem 5.1 guarantees the validity of the algorithm, i.e., it ensures that the resulting samples are effectively drawn from the desired distributions and . The second property ensures that the acceptance rate is the same for both and and, as discussed in more detail in Section 5.1, it can be lowered below that of both (25) and (26) for suitable choices of . Finally, the third property guarantees that the number of evaluations of and is reduced by at least a factor of compared with (25) and (26). This is a substantial advantage in terms of efficiency of the algorithm, mostly when dealing with complex functional forms for .

Remark 5.2.

Notice that, in principle, the acceptance sampling algorithms in (20) and Algorithm 2 can be implemented considering any smooth estimator of the form of (8). However, when considering estimators that are not bona fide, such as (14), it is easy to show that the resulting samples are not drawn from but rather by its bona fide counterpart in the formulation of Devroye–Györfi (see Table 1). Consequently, its acceptance rate differs from by a multiplicative factor of . Similarly, when is only known up to a normalizing constant, its acceptance rate differs from by a multiplicative factor of .

5.1 Choice of

Despite the choice of the instrumental probability function in Algorithm 2 being arbitrary, it is should be simple enough so that it is (i) easy to sample from and (ii) sufficiently flexible to generalize to different settings. Possible choices of that satisfy these two criteria include the density of a mixture of normal random variables when is continuous with unbounded support, the density of a mixture of truncated normal random variables when is continuous with bounded support and the pmf of a mixture of Poisson and/or negative binomial random variables when is discrete.

Notice that plays a fundamental role because it affects both the acceptance rate and the computational efficiency of the sampling scheme. Therefore, once a parametric form for is selected, one can calibrate its parameters to minimize , i.e.,

(30)

Finally, the acceptance rate of and is reduced below that of (25)-(26) whenever .

The right panel of Figure 2 shows the CD-plot obtained for our truncated normal example. Confidence bands, deviance p-value and standard errors have been computed by simultaneously sampling datasets from and by means of Algorithm 2. In this case, the instrumental density is chosen to be the pdf of a mixture of truncated normals over with three components. The mixture weights, means and variances have been chosen to be the solutions of (30) and lead to the density

(31)

where

denotes the pdf of a normal distribution with mean

, standard deviation

, and truncated over the range . The resulting value for is which guarantees an acceptance rate of for both and and it is such that . Therefore, it leads to approximately the same acceptance rate as (25) and (26), while reducing the number of evaluations of the auxiliary densities. Figure 3 compares , , and the respective comparison densities.

Finally, both the confidence bands as well as the p-values for the deviance test in (12) suggest that the polynomial parametric start in (15) overestimates the tail on the distribution.

INPUTS: sample observed , parametric start , significance level ,
     number of LP score functions , number of Monte Carlo replicates ,
     instrumental probability function (optional).
Step 1: Estimate via (32) on .
Step 2: Estimate , , via (33) on .
Step 3: Select “significant” coefficients via (35) and set all the others to zero.
Step 4: Compute (12) and call it .
Step 5: Obtain and via (34).
Step 6: For
    Obtain samples from and from via (20) or Algorithm 2.
    A. On :
      i. Estimate via (32) and call it .
      ii. Estimate , , via (33).
      iii. Set to zero nonsignificant coefficients via (35).
      iv. Compute via (12).
      v. Estimate via (34) and call it .
    B. On :
      i. Estimate via (32) and call it .
      ii. Estimate