On the validity of bootstrap uncertainty estimates in the Mallows-Binomial model

by   Michael Pearce, et al.
University of Washington

The Mallows-Binomial distribution is the first joint statistical model for rankings and ratings (Pearce and Erosheva, 2022). Because frequentist estimation of the model parameters and their uncertainty is challenging, it is natural to consider the nonparametric bootstrap. However, it is not clear that the nonparametric bootstrap is asymptotically valid in this setting. This is because the Mallows-Binomial model is parameterized by continuous quantities whose discrete order affects the likelihood. In this note, we demonstrate that bootstrap uncertainty of the maximum likelihood estimates in the Mallows-Binomial model are asymptotically valid.


page 1

page 2

page 3

page 4


A residual-based bootstrap for functional autoregressions

We consider the residual-based or naive bootstrap for functional autoreg...

Bootstrap inference in the presence of bias

We consider bootstrap inference for estimators which are (asymptotically...

Inference by Stochastic Optimization: A Free-Lunch Bootstrap

Assessing sampling uncertainty in extremum estimation can be challenging...

Selective inference for the problem of regions via multiscale bootstrap

Selective inference procedures are considered for computing approximatel...

Functionals of nonparametric maximum likelihood estimators

Nonparametric maximum likelihood estimators (MLEs) in inverse problems o...

Scalable Uncertainty Quantification via GenerativeBootstrap Sampler

It has been believed that the virtue of using statistical procedures is ...

1 Introduction

Rankings and ratings are two types of preference data, which are primarily studied separately in the literature. Rankings may be modeled via a variety of specialized distributions, such as the Mallows Mallows1957, Plackett-Luce Plackett1975, and Bradley-Terry Bradley1952. Ratings are not commonly modeled using statistical methods, but are instead studied using simple summary statistics such as the mean or median lee2013bias,tay2020beyond,NIHPeerReview. In recent years, a growing body of work suggests that modeling rankings and ratings jointly may improve the accuracy of preference modeling and preference aggregation ovadia2004ratings,shah2018design,liu2022integrating. The Mallows-Binomial is the first joint statistical model for rankings and ratings pearce2022unified.

pearce2022unified estimated the Mallows-Binomial model in the frequentist setting via the method of maximum likelihood. Similar to the standard Mallows model, estimation of the Mallows-Binomial is provably an NP-hard problem Meila2012. However, exact calculation of the MLE is possible when a relatively small number of objects are assessed via an exact tree-search algorithm based on A

, and approximate algorithms can be used otherwise pearce2022unified. Still, analytic standard error results are challenging for the Mallows-Binomial and led the authors to propose using the nonparametric bootstrap for uncertainty quantification.

The bootstrap is a very general tool for uncertainty quantification that was first proposed in efron1979. The nonparametric bootstrap is used to estimate the inherent uncertainty of an estimator without making assumptions on its distributional form. Given i.i.d. observations, , the nonparametric bootstrap is performed using the following steps:

  1. Re-sample observations with replacement from the original dataset times, for large . Denote each bootstrap sample , .

  2. Estimate the unknown statistic(s) of interest, , separately using each bootstrap sample. Denote the estimates from each bootstrap sample , .

  3. Form an empirical distribution for using the values , .

Quantiles from the empirical distribution of the unknown statistic(s) of interest may be used for the purpose of creating confidence regions.

Despite its wide applicability, the nonparametric bootstrap does not always yield asymptotically valid confidence intervals. A canonical example in which the bootstrap fails is the estimation of the unknown parameter

given i.i.d. samples from a Uniform distribution. Here, the MLE of

is the maximum order statistic, which has a limiting exponential distribution. The bootstrap empirical distribution, however, will not be able to replicate the asymptotic distribution given the fixed sample, in which the bootstrap estimates of

will always be less than or equal to the full-sample maximum order statistic bickel1981some. Another canonical example in which the bootstrap fails is in estimating the location parameter of a Cauchy distribution. In this setting, the MLE is the sample mean which is itself Cauchy distributed and therefore has infinite variance. As a result, the bootstrap estimator behaves poorly even given large samples politis1998computer.

The Mallows-Binomial likelihood has an unusual form that makes the asymptotic validity of bootstrap uncertainty for the MLE unclear. Specifically, the model is parameterized by continuous parameters whose discrete order impacts the likelihood. That is, the likelihood contains both continuous and discrete components; discontinuities may exist whenever the order of certain parameters change. Thus, frequentist estimation of the Mallows-Binomial model is both a continuous and discrete problem. In the absence of theoretical results regarding the asymptotic distribution of the maximum likelihood estimators, the validity of the nonparametric bootstrap is unclear.

In this note, we demonstrate that the nonparametric bootstrap is an asymptotically valid method of uncertainty quantification for maximum likelihood estimates in the Mallows-Binomial model. The rest of the paper is organized as follows: In Section 2, we provide preliminaries regarding notation and a formal model statement. Main asymptotic results are provided in Section 3, followed by a conclusion in Section 4 that summarizes our work and suggests directions for future study.

2 Preliminaries

2.1 Notation

Suppose there exists a collection of objects which will be assessed by judges. Each judge rates each object using the integers , where is a fixed and known maximum rating. Smaller ratings are better; higher ratings are worse. We let represent the rating that judge assigns to object . Additionally, each judge provides a ranking of their most-preferred objects. For simplicity, we assume each judge provides a ranking of all the objects. We let denote judge ’s ranking.

Furthermore, suppose that each object has a true underlying quality

. The vector of true underlying qualities is written as

, such that . Given a vector , we let denote its order from least to greatest. For example, if , then (read as “object 3 is preferred to object 2, which is preferred to object 1"). We call the true consensus ranking of the objects. Additionally, we assume there is a true , which represents the strength of population ranking consensus. is defined identically to that from a traditional Mallows model.

2.2 Mallows-Binomial

The Mallows-Binomial model is a joint distribution for rankings and ratings that is designed to capture the above situation. Let

denote an i.i.d. sample of size from a Mallows-Binomial() distribution. Their joint likelihood can be written:


where is the Kendall’s distance between the two rankings and


is the normalizing function of a Mallows model. As can be seen from Equation 1, the likelihood of each observation corresponds to the product of a Mallows ranking distribution with Binomial rating distributions.

Next, we provide a preliminary expression for the MLE, :






There is no closed-form solution for the MLE. However, computationally-efficient frequentist estimation is possible via a tree-search method based on the A algorithm pearce2022unified.

In the remainder of this work, we will assume that in , whenever . Furthermore, we assume that each , and that . Under these conditions, pearce2022unified proved that the maximum likelihood estimator is consistent for as the number of judges, , grows to infinity.

3 Asymptotic Bootstrap Validity

We would like to show that bootstrap uncertainty estimates are asymptotically valid for the MLE in the Mallows-Binomial model. A sufficient condition for bootstrap validity is local asymptotic normality of the MLE hall2013bootstrap,bickel1981some. As such, we show in the following subsections that the -dimensional MLE is coordinate-wise, locally asymptotically normal.

3.1 Local Asymptotic Normality of

We begin by considering each , . Note that is the solution to the following equation:


A key challenge in calculating is the derivative , which is a function of the -dimensional vector . However, as long as each whenever , then will remain constant in small perturbations around and the derivative in will thus be 0.

Generalizing from to , we require there to exist an -ball around the J-dimensional vector such that remains constant for all in the ball. In such cases, we have and thus is defined by the standard Binomial MLE,

, which is asymptotically normal as it is a function of the mean of i.i.d. random variables.

Specifically, this means that in a local region defined by the order of , we have the standard Binomial result,


which establishes the coordinate-wise local asymptotic normality of , .

3.2 Local Asymptotic Normality of

We now show that is coordinate-wise a locally asymptotically normal estimator of . Note that is the solution to the following equation:


For simplicity, we define . Thus, we have


No simple expression for exists. However, it can be seen from Equation 3 that when and for any , is a continuous and positive function with a continuous and strictly negative first derivative and a continuous and strictly positive second derivative. Thus, is a smooth and positive function. Furthermore, is monotone decreasing fligner1986distance. As a result, its inverse is well-defined and so is given .

For reasons which will be made clear later, we also write out an expression for :


Next, note that is a random variable given a fixed consensus ranking , due to the randomness in the collection of rankings . According to fligner1986distance, is asymptotically normal with mean and variance depending on the true . Specifically,


Interestingly, we see from comparing Equations 20 and 21 that , which implies . Since is a real-valued function that does not equal 0, by the Delta method,


Although the asymptotic variance cannot be written in closed-form expression, it is positive and finite. Therefore, in a local area of , is coordinate-wise an asymptotically normal estimator of .

3.3 Asymptotic Normality of

We showed in Sections 3.1 and 3.2 that in a local neighborhood of the true consensus ranking , the estimators , and are coordinate-wise asymptotically normal. We add that the MLE is consistent as the number of observations, , grows to infinity pearce2022unified. Thus, and . As a result, as the MLE will be in a local neighborhood of the true

with probability tending to 1, and the MLE

will be coordinate-wise an asymptotically normal estimator of in that local neighborhood. This satisfies a sufficient condition for asymptotic bootstrap validity hall2013bootstrap,bickel1981some.

4 Conclusion

In this note, we demonstrate that bootstrap uncertainty estimates are asymptotically valid for the MLE in the Mallows-Binomial distribution. Our work only proves coordinate-wise, local asymptotic normality of the vector-valued MLE. This result technically guarantees only asymptotically accurate marginal coverage. As a result, the confidence intervals may suffer from either overcoverage or undercoverage when applied jointly. To understand why, we can think of the marginal confidence intervals jointly creating a confidence region that is a -dimensional hypercube, as opposed to a -dimensional confidence ellipse that could be created via a true joint analysis. Overcoverage may occur if the confidence hypercube contains as a subset the (theoretical) confidence ellipse. On the other hand, if each coordinate of the confidence hypercube is independent, joint coverage becomes a multiple testing problem and may result in undercoverage. That said, the present results do not preclude the possibility that bootstrap uncertainty estimates provide asymptotically correct coverage in the joint setting. We find no evidence to suggest asymptotically correct joint intervals are invalid. Further research may demonstrate proper coverage in the joint setting by establishing joint asymptotic normality of the -dimensional MLE in a local neighborhood of .

The present work does not address the computational challenges of frequentist estimation of the Mallows-Binomial model, which is an NP-hard problem pearce2022unified,Meila2012. As a result, forming an appropriate empirical distribution of the MLE via repeated estimation of bootstrap samples may be intractably slow. Bayesian estimation presents a natural alternative for estimating uncertainty in a unified framework, which may ultimately speed up the process of parameter estimation and inference and make moot the question of bootstrap validity.


The authors would like to thank Yen-Chi Chen for his helpful feedback and advice while assembling this work. This work was funded by NSF Grant No. 2019901.