DeepAI

# Bayes Calculations from Quantile Implied Likelihood

A Bayesian model can have a likelihood function that is analytically or computationally intractable, perhaps due to large data sample size or high parameter dimensionality. For such a model, this article introduces a likelihood function that approximates the exact likelihood through its quantile function, and is defined by an asymptotic chi-square distribution based on confidence distribution theory. This Quantile Implied Likelihood (QIL) gives rise to an approximate posterior distribution, which can be estimated either by maximizing the penalized log-likelihood, or by a standard adaptive Metropolis or importance sampling algorithm. The QIL approach to Bayesian Computation is illustrated through the Bayesian modeling and analysis of simulated and real data sets having sample sizes that reach the millions. Models include the Student's t, g-and-h, and g-and-k distributions; the Bayesian logit regression model; and a novel high-dimensional Bayesian nonparametric model for distributions under unknown stochastic precedence order-constraints.

• 2 publications
• 15 publications
02/09/2021

### Nonparametric C- and D-vine based quantile regression

Quantile regression is a field with steadily growing importance in stati...
05/15/2022

### Large Data and (Not Even Very) Complex Ecological Models: When Worlds Collide

We consider the challenges that arise when fitting complex ecological mo...
03/07/2021

### On a log-symmetric quantile tobit model applied to female labor supply data

The classic censored regression model (tobit model) has been widely used...
07/24/2017

### Accelerating Approximate Bayesian Computation with Quantile Regression: Application to Cosmological Redshift Distributions

Approximate Bayesian Computation (ABC) is a method to obtain a posterior...
08/14/2020

### Bayesian Quantile Matching Estimation

Due to increased awareness of data protection and corresponding laws man...
07/07/2021

### Exact Learning Augmented Naive Bayes Classifier

Earlier studies have shown that classification accuracies of Bayesian ne...
01/27/2017

### Modelling Preference Data with the Wallenius Distribution

The Wallenius distribution is a generalisation of the Hypergeometric dis...

## 1 Introduction

For any Bayesian model, statistical inference focuses on the posterior distribution of the model’s parameters, which combines the model’s data likelihood with a prior distribution on the parameter space. However, a realistic Bayesian model for the data may be defined by a likelihood function that is intractable or difficult to manage, possibly due to large data sample size, number of parameters, or complex likelihood functional form. Then, computations of the posterior distribution can become slow, cumbersome, or impossible.

In such a situation, a likelihood-free method can provide a more tractable approximate likelihood, which combined with the prior distribution, yields an approximate posterior distribution that can be more rapidly estimated using any suitable Markov chain Monte Carlo (MCMC), importance sampling (IS), or other Monte Carlo (MC) or optimization-based computational algorithm (Robert & Casella, 2004). Each likelihood-free method is based on a specific approximate likelihood that is constructed in each iteration of a Monte Carlo (MC) posterior sampling algorithm run, as follows (Karabatsos & Leisen, 2018). Standard Approximate Bayesian Computation (ABC) employs an indicator function of whether the distance between summary statistics for the data set and summary statistics of a synthetic data set sampled from the intractable likelihood, is within a user-chosen threshold. The synthetic likelihood (SL) method (Price et al. 2018) employs a multivariate normal pdf for the data summary statistics, with mean and covariance parameters estimated from synthetic data sets that are iid sampled from the intractable likelihood. Brute force SL employs nonparametric kernel density estimation from synthetic data sets without summary statistics (Turner & Sederberg, 2014). Other methods either estimates an empirical likelihood (EL) under the constraints defined by given intractable likelihood per sampling iteration (Mengersen et al. 2013), or estimates a bootstrap likelihood (BL) before running a posterior sampling algorithm (Zhu et al. 2016).

Likelihood-free methods have clearly become one of the standard statistical tools and have allowed the Bayesian paradigm to be routinely applied to sophisticated models that adequately describe modern data, including models defined by intractable likelihoods. These methods have been applied in many fields over the past two decades, including archaeology, astronomy, many biology subfields, climate science, economics, health sciences, hydrology, social sciences, signal processing, stereology, structural dynamics, and related fields (Karabatsos & Leisen, 2018).

However, various scientific fields will continue to make rising expectations for likelihood-free methods to provide Bayesian statistical inferences for models with intractable likelihoods, possibly with more complex forms, and for larger and high-dimensional data sets which are continually becoming easier to collect with ongoing advances in computer technology. However, despite their past successes, the standard likelihood-free methods mentioned earlier are not fully satisfactory (Karabatsos & Leisen, 2018), for the following reasons. They construct approximate likelihoods that depend on subjective choices of tuning parameters such as summary statistics, tolerance level, and distance measure (ABC), or on the number of synthetic data sets to sample per MC sampling iteration (SL); or rely on iterative fast sampling or point estimation based on the given intractable likelihood (ABC, SL, EL, BL), which can be slow or cumbersome per sampling iteration when the likelihood function is complex, or when the data sample size or the number of model parameters is large. Besides, point-estimators are not available for many Bayesian models. Nevertheless, the rising expectations by scientific fields can be addressed through the development of a theoretically-driven and -justifiable likelihood-free methodology defined by a general approximate likelihood that can be constructed and easily computed in an automatic, routine and a computationally efficient manner, in each iteration of a posterior distribution estimation algorithm.

We introduce the Quantile Implied Likelihood (QIL), a novel method for constructing approximate likelihoods based on pivotal inference (Barnard, 1980) and on the asymptotic implied likelihood approach (Efron, 1993) to confidence distribution theory (Schweder & Hjort, 2016). The QIL is useful for providing approximate posterior inference of Bayesian models with intractable likelihoods. The QIL is an asymptotic chi-square () pdf, and thus can be efficiently computed and avoids many of the problems of the previous likelihood-free methods. The QIL has lower computational cost than the synthetic, empirical, and bootstrap likelihoods, and does not rely on ABC tuning parameters, point-estimation, or synthetic data or bootstrap sampling. Indeed, low computational cost is a hallmark of implied likelihoods (Efron, 1993). In short, the QIL provides an answer to the scientific fields’ rising expectations for likelihood-free methods.

The QIL was developed from the fundamental observation that a likelihood-free method can be based on an approximate likelihood defined by the pdf of the sampling distribution of a ‘distribution test’ statistic. The QIL is the asymptotic

pdf for the sampling distribution for a pivotal quantity of a new quantile-based distribution test of the null hypothesis that the true data-generating distribution equals the likelihood distribution given model parameters, against the alternative hypothesis of inequality. This pivotal quantity is defined by the Mahalanobis distance between data sample quantiles and the quantiles of the model likelihood given parameters, on

evenly-spaced quantile probabilities and degrees of freedom; and has a

distribution under the null hypothesis by virtue of the asymptotic multivariate normality of the sample quantiles given model parameters (Walker, 1968). The corresponding  cdf provides a p-value function (Fraser, 1991) for an asymptotically uniformly most powerful (and unbiased) distribution test. This test is based on an exact, minimum-volume, and transformation-invariant confidence set for the unknown quantiles of the true data-generating distribution, conditionally on any given model parameters and associated quantile function. The QIL may be efficiently computed from a number of quantiles that can be chosen naturally to yield sample quantiles that are close in Kolmogorov distance to the full data set, even for Big Data (large ) sets, while still factoring in the total sample size in the likelihood computation.

The QIL can be applied to univariate or multivariate, iid or non-iid observations. The QIL can handle multivariate observations after taking one-dimensional transformations of the data using the Mahalanobis depth function (Liu & Singh, 1993), to provide a basis for multivariate quantiles. The QIL can be extended to non-iid grouped (or regression) data by employing a product of chi-square pdfs over the multiple groups of observations, which employs the same conditional independence assumption as that of the exact model likelihood. Such a QIL for non-iid data is an example of a composite likelihood (Varin, et al. 2011).

For the given Bayesian model, the approximate posterior distribution is formed by combining the QIL and the chosen prior distribution for the model parameters. The QIL is a partial likelihood that goes beyond the median (Doksum & Lo, 1990) by summarizing the data by a vector of quantiles, to give rise to a new type of ‘limited information likelihood’ posterior distribution. Also, under the uninformative flat prior, the posterior distribution for the model parameters is based on the matching prior for the pivotal quantity (Welch & Peers, 1963). In addition, the QIL’s simple and explicit

pdf form makes it possible to use with any standard MC or optimization algorithm for posterior estimation, unlike the EL method which relies exclusively on IS algorithms. Further, the maximum a posteriori (MAP) estimate and the posterior covariance matrix of the model parameter can be quickly estimated through penalized QIL maximization, adding to previous work on maximum intractable likelihood estimation (Rubio & Johansen, 2013).

The QIL approach goes beyond providing posterior inference for Bayesian models with intractable likelihoods. This approach gives rise to posterior distributions that are fully-interpretable as confidence distributions, since the QIL is based on a linear location model (Fraser, 2011a) representing the asymptotic multivariate normal distribution for the sample quantiles. This is important because arguably, the concept of confidence is the most substantive ingredient in modern model-based statistical theory (Fraser, 2011b). Alternatively, the EL method does not necessarily admit posterior distributions that are easily interpretable as confidence distributions, because it is possible under the EL for the data to nonlinearly relate with the model parameters (Fraser, 2011a). The same is true for the BL method.

More details about the QIL approach to Bayesian inference is described next in

. This section also reviews some penalized likelihood, adaptive Metropolis, and IS methods for inference from approximate posterior distribution based on QIL. illustrates the QIL approach on Bayesian models through the analysis of many large or complex real and simulated data sets, including multiple versions of some of these models employing different prior distributions. Results include computation times and accuracies of QIL-based posterior inferences from simulated data in terms of Root Mean Squared Error (RMSE).

The Bayesian models include familiar low-dimensional parametric probability models which provide basic benchmarks for QIL (e.g., Student’s t model). They also include models with more intractable likelihoods as follows:

1. (Univariate iid data). The generalized -and- and -and- distributions (MacGillivray, 1992) are each defined by a likelihood function through its quantile function. This likelihood has no closed-form expression and high computational cost for large data sets. For each model, we will show that it provides improved accuracy in posterior inferences and competitive computational speed compared to standard ABC methods.

2. (Univariate non-iid data).

The Bayesian binary regression model has regression coefficients that may be assigned a multivariate normal prior, or even a LASSO prior with unknown shrinkage hyperparameter for variable selection. For the binary logit or probit model, standard Gibbs sampling MCMC methods are computationally slow for large data samples sizes because they rely on iterative sampling of a latent variable for each observed binary dependent response. Such a method is further decelerated when a LASSO prior applied to the regression coefficients. This is because then, per sampling iteration, additional steps are needed to perform sampling updates of the coefficient scale parameters and the shrinkage hyperparameter, and to perform a matrix inversion to obtain the conditional posterior covariance matrix of the coefficients (Park & Casella, 2008). We will show that the QIL, as an approximate likelihood for the Bayesian binary regression model, provides improved speed in posterior computations compared to Gibbs sampling MCMC methods, for any choice of smooth link function.

3. (Matrix-variate iid data). The Bayesian exponential random graph (ERG) model for network data is a doubly-intractable model, defined by a likelihood with an intractable normalizing constant () formed by a sum over a large number of all possible events, and by an intractable normalizing constant in its posterior density function (Caimo & Friel, 2011). Similar examples of doubly-intractable likelihood models include the Ising, Potts, Spatial point process, Massive Gaussian Markov random field, and Mallows models. Due to the intractability of , standard (e.g., MCMC) posterior estimation algorithms are inapplicable for doubly-intractable models, and it may also be difficult to sample directly from the given likelihood. Different MC posterior sampling methods have been proposed for such models which either eliminate or estimate or , but these methods are non-trivial to implement and not fully satisfactory (Liang et al. 2016). The QIL approach can provide posterior inferences of a doubly-intractable model, by specifying the QIL as a surrogate to the model’s implied low-dimensional model that does not depend on

, which for example is a binary logit model implied by the ERG model (Strauss & Ikeda, 1990). Also, we will show that a LASSO prior can lead to more reasonable marginal posterior variances of the ERG model parameters, compared to the overly-high variances (standard errors) that can be obtained from Maximum Likelihood Estimation (MLE) methods (Caimo & Friel, 2011).

4. (Multivariate iid data). The Bayesian multivariate skew normal distribution is an attractive model that is robust to empirical violations of distributional symmetry. However, it has a rather unmanageable likelihood function for parameter estimation purposes (Liseo & Parisi, 2013; Azzalini & Capitiano, 1999). Also, in many statistical applications, including those where the ratio () of the number of variables to the sample size is large, it is of interest to perform posterior distribution inferences of the inverse covariance matrix (Fan et al. 2014), which describes the partial correlation between each variable pair after removing the linear effect of the other variables, for ; and describes the partial variance of each variable after removing the linear effect of the other variables, for (Pourahmadi, 2011). A robust inference method for the inverse covariance matrix was only developed from a frequentist (non-Bayesian) perspective (Zhao & Liu, 2014). We will show that after approximating the multivariate skew-normal likelihood by a QIL, this model can provide robust and computationally-fast posterior inferences of the inverse-covariance matrix through importance sampling of the posterior distribution, while avoiding computations of costly matrix inverses and nuisance parameters, which in turn simplifies posterior distribution estimation.

5. (Multivariate non-iid data).

The Bayesian approach to the multivariate Wallenius (noncentral hypergeometric) distribution (Wallenius, 1963; Chesson, 1976) is useful for the analysis of preference data from individuals, where each person chooses (without replacement) any number of objects from a total set of objects (resp.) from mutually-exclusive categories (Grazian et al. 2018). The exact likelihood of this model contains a computationally-costly integral for each individual from the person sample. In this study, we will show that the QIL approach can provide tractable and accurate posterior inferences of Wallenius model choice weight parameters, based on a QIL that depends on means and variances calculated from all model parameters. The original Bayesian Wallenius model (Grazian et al. 2018) assumes that the choice weight parameters are the same for all persons in a given sample. The current study also introduces a new hierarchical Bayesian Wallenius model, which allows for the choice weight parameters to differ across persons. It turns out that the QIL can be easily extended and applied to provide inference of the posterior distribution of this hierarchical model.

As we shall see, our QIL framework is suitable for Bayesian inference with the above models, but extends far beyond these applications and allows us to push further the boundaries of the class of problems can be addressed by likelihood-free methods. The Supporting Information provides software code that was used for all data analyses.

## 2 Quantile Implied Likelihood (QIL) for Bayesian Inference

More formally, any Bayesian statistical model for a set of iid data observations specifies a likelihood (with , and if ), with prior density function (cdf ) defined on the parameter space . The (exact) likelihood has cdf , and for univariate , quantile function , for

. According to Bayes’ theorem, the data set

updates the prior to a posterior distribution, defined by the density function , with marginal likelihood

and posterior predictive density function

. These ideas easily extend to a Bayesian model for non-iid data, defined by likelihood for independent groups. The QIL, described next, provides approximation to the likelihood that may be intractable.

### 2.1 QIL for Univariate (iid or non-iid) Data

First, consider a data set of size sampled as , where () is a given but unknown true continuous cdf (pdf) on , with corresponding quantile function defined for any quantile probability . The data are fully-described by empirical cdf and order statistics .

The QIL is based on a subset of sample quantiles with empirical distribution on equally-spaced quantile probabilities in . These sample quantiles

can be found by linear interpolation (Hyndman & Fan, 1996). For large

, we can write where is the ceiling function.

As an aside, a more computationally efficient Bayesian analysis of data having large sample size can be achieved by focusing the analysis on a subset of sample quantiles on equally-spaced quantile probabilities in . Here, can be selected as the value which yields, for a chosen constant (e.g., ), an empirical distribution that well-approximates the full data empirical distribution , according to:

 d(ϵ;ˆFn)=mind∈{1,…,n}{d:|ˆFn(yi)−ˆFd(yi)|≤ϵ,ϵ≥0}. (2.1)

We now formally state the assumptions of the QIL for univariate data.

Assumption 1. The (univariate) empirical distribution of the data, , is well-approximated by the empirical distribution formed by , the sample quantiles on the equally-spaced quantile probabilities

Assumption 2. For the given (univariate) data set generated by the unknown true distribution , the specified Bayesian model is correct in the sense that the equality exists for some parameter supported by the prior with pdf .

The QIL is based on asymptotic large-sample theory. If Assumption 2 holds, then for any the corresponding sample quantiles have a -variate normal distribution law (), as (Walker, 1968; Ferguson, 1996), with mean and covariance matrix:

 V(fθ)=(min(λj,λk)[1−max(λj,λk)]fθ(qθ(λj))fθ(qθ(λk)))d×d. (2.2)

Then asymptotically,

 ˆqn,dL→Nd(qθ,d,1nV(fθ)) as n→∞, (2.3)

which implies the (asymptotic) normal linear location model for the observed quantiles ,

 ˆqn,d=qθ,d+Cε, \ ε∼N(0,Id), as n→∞, (CC⊺=1nV(fθ)); (2.4a) and that the following pivotal quantity:
 tθ(Yn)=n(ˆqn,d−qθ,d)⊺[V(fθ)]−1(ˆqn,d−qθ,d) (2.5)

follows a chi-square () distribution on degrees of freedom:

 tθ(Yn)∼χ2d, (2.6)

since if , then and . By definition, the pivotal quantity, , is a function of the observations and of the parameters (via ), such that the () distribution of is the same for all and all (DeGroot & Schervish, 2012). When is intractable, then in (2.2) can be calculated by using the equiprobability pdf (Breiman, 1973):

 feθ(⋅)=d∑j=11(qθ(λj−1)<⋅≤qθ(λj))(d+1)(qθ(λj)−qθ(λj−1)), (2.7)

where is an the indicator function and is a small constant.

The distribution in (2.6) has the following attractive features:

• It provides an example of the pivotal approach to statistical inference (Barnard, 1980), formally defined by elements , where is the sample space of the sample quantiles ; and is called a basic pivotal having pivotal space , with inverse mapping one-to-one and measurable; and

is the space of probability distributions for the basic pivotal, defined by

pdfs labelled by degrees-of-freedom parameters , where can be estimated by the procedure (2.1); and where is called a robust pivotal because it is pivotal for all ;

• It provides an asymptotic confidence distribution for , meaning that:

 Pr[tθ(Yn)≤χ−2d(u)]=Pr[χ2d(tθ(Yn))≤u]L→u, as n→∞, (2.8)

holds for all and all , with cdf , quantile function

, and stochastic confidence interval

of coverage probability (Xie & Singh, 2013; Nadarajah et al. 2015). The confidence cdf constitutes a pivot because (2.8) is an invertible function of

which has a uniform distribution independent of

(Barndorff et al. 1994; Schweder & Hjort, 2002); and is a p-value function (Fraser, 1991) of the possible values of the parameter , which outputs the probability for pivotal quantities located to the ‘left’ of the pivotal quantity observed from the data , given ;

• It corresponds to a pdf that defines a confidence density and an asymptotic implied likelihood (Efron, 1993)s, and defines the QIL (for iid data), given by:

 fQθ(Yn)≡dχ2d(tθ(Yn))dtθ(Yn)=[tθ(Yn)]d/2−1exp[12tθ(Yn)]K(d), (2.9)

with . The implied likelihood approach to statistical inference is based on a real-valued function of all model parameters (Efron, 1993). The QIL in particular employs the real-valued, pivotal function . Notably, while the QIL (2.9) is computed from a subset of quantiles, it still depends on the total sample size of the full data set via the pivotal statistic in (2.5). Also, gives a proper scoring rule for predictive quality (Geneiting & Raftery, 2007);

• It provides a new (asymptotic) distribution test of the null hypothesis () versus (), with unknown (and ) estimated by , known (and ), and as before. This hypothesis test has p-value based on the observed pivotal quantity in (2.5). The null is rejected if and only if the pivotal quantity , which summarizes , falls within the rejection region , corresponding to non-rejection region . This is by virtue of the close relationship between the confidence interval and the p-value. Indeed, based on Assumption 1, a confidence set for the vector of quantiles of the unknown true distribution , estimated by , and based on the asymptotic normal distribution (2.3) conditionally on a given , is given by:

 Cαθ(qd)={qd∣n(qd−qθ,d)⊺[V(fθ)]−1(qd−qθ,d)≤χ−2d(1−α)}. (2.10)

This confidence set in (2.10), asymptotically as , is analogous to a confidence set for based on a known -variate normal population distribution N (Siotani, 1964); is an exact confidence set for , that is, for all , and thus is a transformation-invariant confidence set; and has the smallest volume among all level confidence sets for each , implying that it provides a uniformly most powerful (asymptotic) test of the hypothesis (), and that the confidence set averaged over any distribution of is also minimized in volume (Pratt, 1961). Also, given a level confidence interval for , we have that for any one-to-one function of , . However, when is of dimension greater than one and viewed as a function of with confidence set , in general we can only assert that and therefore is a set estimator of with confidence coefficient greater than (Rao, 1965). This reflects that the QIL approximates the given exact model likelihood .

For non-iid observations from groups, with exact likelihood , the QIL is defined by:

 fQθ(Yn)≡K∏k=1fQθ,k(Ynk)=K∏k=1dχ2dk(tθ(Ynk))dtθ(Ynk)=K∏k=1[tθ(Ynk)]dk/2−1exp[12tθ(Ynk)]K(dk), (2.11)

including corresponding pivotal quantities:

 tθ(Ynk)=nk(ˆqnk,d−qθ,d)⊺[V(fθ,k)]−1(ˆqnk,d−qθ,d), k=1,…,K, (2.12)

with , for . Equation (2.11) is an example of a composite likelihood (Varin et al. 2011).

#### 2.1.1 Other Considerations

Despite the similarity in name, the QIL for univariate iid data (2.9) is not the ‘product spacings’ or ‘quantile’ likelihood (Cheng & Amin, 1983; Ranneby, 1984; Titterington, 1985; Heathcote et al. 2002; Heathcote & Brown 2004. The product spacings likelihood is instead defined by a multinomial distribution that assigns univariate iid data observations into disjoint interval categories bounded by sample quantiles, and based on multiple integrals of the exact model likelihood which can be time-consuming to compute when is intractable.

Also, as an alternative to the QIL, one may consider the pdf of the sampling distribution of a classical distribution test, such as the Anderson-Darling or Kolmogorov-Smirnov test (Stephens, 1974). However such a classical test requires computation of the likelihood cdf, which can be time consuming when the likelihood is intractable. For example, the -and- distribution has no closed-form expression of the likelihood, and evaluating this likelihood’s cdf for a given input requires the use of time-consuming numerical optimization methods (Prangle, 2017).

The above computation cost issues are compounded in the context of a posterior estimation algorithm (), which in practice typically computes the likelihood for hundreds or thousands of values of .

### 2.2 QIL for Multivariate Data

The QIL (2.9) or (2.11) can be extended to multivariate iid or non-iid data. Specifically, first consider a multivariate iid data set sampled as , with unknown continuous pdf (cdf ) defined on , and . For a Bayesian model with likelihood , the Mahalanobis depth function (Liu & Singh, 1993) is defined by:

 DM(y;μθ,Σθ)=[1+(y−μθ)⊺Σ−1θ(y−μθ)]−1=[1+Mθ(y)]−1, (2.13)

where , and is the mean and covariance matrix of (), with

a Mahalanobis distance. The Mahalanobis depth, which obviously depends on the existence of the second moments, is consistent with four axioms that every reasonable notion of depth should satisfy (Mosler, 2013). These axioms are:

• Affine invariance for any nonsingular matrix and constant vector , where is the mean and covariance matrix of the distribution of with ;

• Vanishing at infinity

• Upper semicontinuity: For each the -central region is closed.

• Monotone on rays: If is of maximal depth (), then for each , the function is monotone decreasing in , with ;

Also, the Mahalanobis depth is strictly monotone in the sense that, for each , each neighborhood of contains points of depth larger than , with . That is, is the closure of all points such that (Dyckerhoff, 2017).

The Mahalanobis depth provides a coherent basis for multivariate quantiles, and has the same probabilistic interpretations analogous to the univariate case (Serfling, 2002b). The rank order of the sample Mahalanobis depths define multivariate order statistics (Liu et al. 1999), where has the highest depth and defines a multivariate median (Rousseeuw & Leroy, 1987), and has the smallest depth and is the most outlying point. For the one-dimensional transformed data , it is possible to select sample quantiles on equally-spaced quantile probabilities for some using (2.1) with in place of . Here, denotes a robust estimator of the mean and covariance matrix (Rousseeuw & Van Driessen 1999).

It turns out that the QIL for multivariate data has a tractable and convenient form, as a consequence of the following assumption and theorem.

Assumption 3. For the given (multivariate) data set sampled from the unknown true distribution , the specified Bayesian model is correct in the sense that for some parameter supported by the prior , and is in the family of multivariate skew normal distributions.

Recall that the multivariate skew normal distribution is defined by the pdf:

 snp(y;ξ,Σ,α)=2ϕp(y−ξ;Σ)Φ(α⊺ω−1(y−ξ)), (2.14)

with zero-mean -variate normal pdf , Normal cdf , and parameters of location , shape , and normal covariance matrix , all which define the skew-normal with mean and covariance matrix (Azzalini & Capitanio, 1999). The following theorem establishes conditions that will ultimately provide a convenient form for the QIL for multivariate data.

###### Theorem 2.1.

If Assumption 3 holds and , then
(a) ;
(b)  the Mahalanobis depth, for a fixed , has cdf and monotone transformation given by the complementary cdf:

 DR(y;μθ,Σ) =PrFθ(DM(Y;μθ,Σ)≤DM(y;μθ,Σ)) (2.15a) =PrFθ(Mθ(Y)≥Mθ(y))=1−χ2p(Mθ(y)); (2.15b) (c)  DR(Y;μθ,Σ)∼ Uniform(0,1).

Proof. (a) holds since implies that , and that by a simple extension of Proposition 5 of Azzalini and Capitanio (1999), with and , leading to and independently for , with . (b) holds true, because as a consequence of the skew-normality assumption for and the strict monotonicity and affine invariance properties of the Mahalanobis depth (the latter implying the affine invariance of ), the contours of constant are of the form (Liu & Singh, 1993). Also, it is easy to verify that the depth satisfies strict monotonicity for the family of skew-normal distributions. That is for the given depth , it is always possible to find a value near with higher depth. (c) follows from the probability integral transform, given (b) which establishes the continuity of the Mahalanobis depth distribution (Liu & Singh, 1993).

Now, if Assumption 3 holds, then the sample quantiles on quantile probabilities have an asymptotic normal distribution (Serfling 2002a), here, with mean and covariance matrix:

 V(fθ)=(min(λj,λk)[1−max(λj,λk)]fR,θ(qR,θ(λj))fR,θ(qR,θ(λk)))d×d=(min(λj,λk)(1−max(λj,λk)))d×d, (2.16)

with and the Uniform pdf and quantile function (resp.).

It then follows that for multivariate iid observations, the QIL is still given by (2.9), with distributed pivotal statistic (2.5)-(2.6), and the direct connections between pivotal inference, the confidence distribution, and hypothesis testing of (vs. ) still remain. For the multivariate setting, such a test is sensitive to location or dispersion departures from (Liu & Singh, 1993). Also, by extension, for multivariate non-iid observations in  groups, the QIL is still given by the composite likelihood (2.11), with corresponding pivotal statistics (2.12).

#### 2.2.1 Other Considerations

If for some parameter and is not supported by family of multivariate skew normal distributions, then (with ) has a non-standard distribution (e.g., a beta mixture) (Fang, 2003; Shah & Li, 2005), with corresponding depth functions and and multivariate QIL which would be computationally-cumbersome in general. This is especially true in the context of a posterior distribution estimation algorithm () which would need to compute this distribution for hundreds or thousands of values. However, additional variables can be introduced into a MC posterior estimation algorithm (), such that remains within the skew normal family and has marginal pdf , to provide a tractable QIL via (2.9) or (2.11) after replacing with .

The multivariate QIL may instead be defined by the Bates distribution of the mean of

iid Uniform(0,1) random variables (Johnson et al. 1995). This is the exact distribution of the test statistic

under the null hypothesis (Liu & Singh 1993). Unfortunately, the Bates pdf is based on combinatorial terms and which cannot be accurately computed when or is large.

### 2.3 The Posterior Distribution Based on the QIL

For any Bayesian model, the QIL , which acts as a surrogate to the exact model likelihood, combines with its prior

to yield an approximate posterior distribution defined by probability density function

, with cdf and predictive density . Also, the posterior distribution gives rise to a posterior distribution for the pivotal quantity .

The QIL-based posterior distribution has at least two interesting characterizations, which are most easily explained based on the QIL for iid observations (2.9), but can be easily extended to the composite QIL (2.11) for non-iid data. First, the posterior density provides a new type of ‘limited information likelihood’ posterior, based on an approximate likelihood that depend on not only on the a median (Doksum & Lo, 1990) but also on multiple () quantiles. Second, since the QIL is based on the asymptotic linear location model (2.4) for the observed quantiles , giving rise to the distributed pivotal quantity (2.6), it follows that the posterior distribution provides a confidence distribution (Fraser, 2011a). As mentioned earlier, confidence is perhaps the most substantive ingredient in modern model-based statistical theory (Fraser, 2011b). In contrast, a posterior distribution based on a model that assumes nonlinear relationships between the data and parameters will yield posterior distributions that depart from confidence distributions (Fraser, 2011a).

### 2.4 Posterior Distribution Based on QIL with Uninformative Flat Prior

Under the uninformative flat prior for the model parameters , the QIL-based posterior distribution has connections with certain approaches to providing “prior-free” statistical inference, specifically, for the inference of the probability distribution of or of . This is most easily explained in terms of the the QIL (2.9) for iid observations, with no loss of generality, as follows. (It is straightforward to extend the following ideas in terms of the QIL (2.11) for non-iid observations, using additional notation which index the given  data groups).

The pivotal approach to “prior-free” statistical inference (Barnard, 1980), generally speaking, proceeds by taking a one-to-one transformation of the given pivotal quantity of interest to yield the partition , and then inferring the distribution of conditionally on the ancillary statistic (Barnard, 1980). Recall that an ancillary statistic is a function of the data but not a function of the parameters , which has a sampling distribution that does not depend on (Ghosh et al. 2010). The QIL (2.9), for the pivotal quantity in (2.5), provides “prior-free” pivotal inference under the specification and and the flat prior . Then, marginally, , and the conditional distribution of given is defined by the pdf, which is identical to the QIL confidence density (2.9) and to the “prior-free” posterior distribution of the original pivotal quantity , with .

Furthermore, according to the asymptotic implied likelihood approach to inference (Efron, 1993), the posterior distribution is based on a matching prior distribution (Welch & Peers, 1963) which coincides with the confidence interval system of the pivotal quantity . Then by definition, for all , the QIL  and the posterior density must satisfy (Efron & Hastie, 2016):

 ∫χ−2d(u)0dχ2d(tθ(Yn))dtθ(Yn)dtθ=∫χ−2d(u)0πQ(tθ∣Yn)dtθ=u. (2.17)

Further, this relation (2.17) holds for all , because is a pivotal quantity which by definition has the same () distribution for all .

### 2.5 Methods for Inference from the Posterior Distribution Based on QIL

We now describe some methods for performing inference from the QIL-based posterior distribution .

#### 2.5.1 MAP and Posterior Covariance Estimation Using Penalized QIL

The MAP estimator is the mode of the approximate posterior distribution , which coincides with the mean when the posterior is unimodal and symmetric (specifically, ). The posterior covariance is the Hessian matrix inverse evaluated at the mode. Under the flat prior , the posterior mode is called the QMLE, which approximates the MLE ( ) having standard errors (SEs) given by the square-roots of the diagonal elements of the Fisher information matrix inverse. As , the posterior approaches a multivariate normal distribution with mean and covariance equal to times the Fisher information matrix inverse (Ferguson, 1996).

For any Bayesian model, the MAP estimate is the solution which minimizes the negative penalized log-likelihood:

 ˆθQ(Yn)=argminθ∈Θ⎡⎣−K∑k=1log[tθ(Ynk)]d/2−1exp[12tθ(Ynk)]−logπ(θ