1 Introduction
Rankings and ratings are two types of preference data, which are primarily studied separately in the literature. Rankings may be modeled via a variety of specialized distributions, such as the Mallows Mallows1957, PlackettLuce Plackett1975, and BradleyTerry Bradley1952. Ratings are not commonly modeled using statistical methods, but are instead studied using simple summary statistics such as the mean or median lee2013bias,tay2020beyond,NIHPeerReview. In recent years, a growing body of work suggests that modeling rankings and ratings jointly may improve the accuracy of preference modeling and preference aggregation ovadia2004ratings,shah2018design,liu2022integrating. The MallowsBinomial is the first joint statistical model for rankings and ratings pearce2022unified.
pearce2022unified estimated the MallowsBinomial model in the frequentist setting via the method of maximum likelihood. Similar to the standard Mallows model, estimation of the MallowsBinomial is provably an NPhard problem Meila2012. However, exact calculation of the MLE is possible when a relatively small number of objects are assessed via an exact treesearch algorithm based on A
, and approximate algorithms can be used otherwise pearce2022unified. Still, analytic standard error results are challenging for the MallowsBinomial and led the authors to propose using the nonparametric bootstrap for uncertainty quantification.
The bootstrap is a very general tool for uncertainty quantification that was first proposed in efron1979. The nonparametric bootstrap is used to estimate the inherent uncertainty of an estimator without making assumptions on its distributional form. Given i.i.d. observations, , the nonparametric bootstrap is performed using the following steps:

Resample observations with replacement from the original dataset times, for large . Denote each bootstrap sample , .

Estimate the unknown statistic(s) of interest, , separately using each bootstrap sample. Denote the estimates from each bootstrap sample , .

Form an empirical distribution for using the values , .
Quantiles from the empirical distribution of the unknown statistic(s) of interest may be used for the purpose of creating confidence regions.
Despite its wide applicability, the nonparametric bootstrap does not always yield asymptotically valid confidence intervals. A canonical example in which the bootstrap fails is the estimation of the unknown parameter
given i.i.d. samples from a Uniform distribution. Here, the MLE ofis the maximum order statistic, which has a limiting exponential distribution. The bootstrap empirical distribution, however, will not be able to replicate the asymptotic distribution given the fixed sample, in which the bootstrap estimates of
will always be less than or equal to the fullsample maximum order statistic bickel1981some. Another canonical example in which the bootstrap fails is in estimating the location parameter of a Cauchy distribution. In this setting, the MLE is the sample mean which is itself Cauchy distributed and therefore has infinite variance. As a result, the bootstrap estimator behaves poorly even given large samples politis1998computer.
The MallowsBinomial likelihood has an unusual form that makes the asymptotic validity of bootstrap uncertainty for the MLE unclear. Specifically, the model is parameterized by continuous parameters whose discrete order impacts the likelihood. That is, the likelihood contains both continuous and discrete components; discontinuities may exist whenever the order of certain parameters change. Thus, frequentist estimation of the MallowsBinomial model is both a continuous and discrete problem. In the absence of theoretical results regarding the asymptotic distribution of the maximum likelihood estimators, the validity of the nonparametric bootstrap is unclear.
In this note, we demonstrate that the nonparametric bootstrap is an asymptotically valid method of uncertainty quantification for maximum likelihood estimates in the MallowsBinomial model. The rest of the paper is organized as follows: In Section 2, we provide preliminaries regarding notation and a formal model statement. Main asymptotic results are provided in Section 3, followed by a conclusion in Section 4 that summarizes our work and suggests directions for future study.
2 Preliminaries
2.1 Notation
Suppose there exists a collection of objects which will be assessed by judges. Each judge rates each object using the integers , where is a fixed and known maximum rating. Smaller ratings are better; higher ratings are worse. We let represent the rating that judge assigns to object . Additionally, each judge provides a ranking of their mostpreferred objects. For simplicity, we assume each judge provides a ranking of all the objects. We let denote judge ’s ranking.
Furthermore, suppose that each object has a true underlying quality
. The vector of true underlying qualities is written as
, such that . Given a vector , we let denote its order from least to greatest. For example, if , then (read as “object 3 is preferred to object 2, which is preferred to object 1"). We call the true consensus ranking of the objects. Additionally, we assume there is a true , which represents the strength of population ranking consensus. is defined identically to that from a traditional Mallows model.2.2 MallowsBinomial
The MallowsBinomial model is a joint distribution for rankings and ratings that is designed to capture the above situation. Let
denote an i.i.d. sample of size from a MallowsBinomial() distribution. Their joint likelihood can be written:(1)  
(2) 
where is the Kendall’s distance between the two rankings and
(3) 
is the normalizing function of a Mallows model. As can be seen from Equation 1, the likelihood of each observation corresponds to the product of a Mallows ranking distribution with Binomial rating distributions.
Next, we provide a preliminary expression for the MLE, :
(4)  
(5)  
(6) 
where
(7) 
and
(8) 
There is no closedform solution for the MLE. However, computationallyefficient frequentist estimation is possible via a treesearch method based on the A algorithm pearce2022unified.
In the remainder of this work, we will assume that in , whenever . Furthermore, we assume that each , and that . Under these conditions, pearce2022unified proved that the maximum likelihood estimator is consistent for as the number of judges, , grows to infinity.
3 Asymptotic Bootstrap Validity
We would like to show that bootstrap uncertainty estimates are asymptotically valid for the MLE in the MallowsBinomial model. A sufficient condition for bootstrap validity is local asymptotic normality of the MLE hall2013bootstrap,bickel1981some. As such, we show in the following subsections that the dimensional MLE is coordinatewise, locally asymptotically normal.
3.1 Local Asymptotic Normality of
We begin by considering each , . Note that is the solution to the following equation:
(9)  
(10) 
A key challenge in calculating is the derivative , which is a function of the dimensional vector . However, as long as each whenever , then will remain constant in small perturbations around and the derivative in will thus be 0.
Generalizing from to , we require there to exist an ball around the Jdimensional vector such that remains constant for all in the ball. In such cases, we have and thus is defined by the standard Binomial MLE,
, which is asymptotically normal as it is a function of the mean of i.i.d. random variables.
Specifically, this means that in a local region defined by the order of , we have the standard Binomial result,
(11) 
which establishes the coordinatewise local asymptotic normality of , .
3.2 Local Asymptotic Normality of
We now show that is coordinatewise a locally asymptotically normal estimator of . Note that is the solution to the following equation:
(12)  
(13)  
(14) 
For simplicity, we define . Thus, we have
(15) 
No simple expression for exists. However, it can be seen from Equation 3 that when and for any , is a continuous and positive function with a continuous and strictly negative first derivative and a continuous and strictly positive second derivative. Thus, is a smooth and positive function. Furthermore, is monotone decreasing fligner1986distance. As a result, its inverse is welldefined and so is given .
For reasons which will be made clear later, we also write out an expression for :
(16)  
(17)  
(18)  
(19)  
(20) 
Next, note that is a random variable given a fixed consensus ranking , due to the randomness in the collection of rankings . According to fligner1986distance, is asymptotically normal with mean and variance depending on the true . Specifically,
(21)  
(22)  
(23) 
Interestingly, we see from comparing Equations 20 and 21 that , which implies . Since is a realvalued function that does not equal 0, by the Delta method,
(24)  
(25) 
Although the asymptotic variance cannot be written in closedform expression, it is positive and finite. Therefore, in a local area of , is coordinatewise an asymptotically normal estimator of .
3.3 Asymptotic Normality of
We showed in Sections 3.1 and 3.2 that in a local neighborhood of the true consensus ranking , the estimators , and are coordinatewise asymptotically normal. We add that the MLE is consistent as the number of observations, , grows to infinity pearce2022unified. Thus, and . As a result, as the MLE will be in a local neighborhood of the true
with probability tending to 1, and the MLE
will be coordinatewise an asymptotically normal estimator of in that local neighborhood. This satisfies a sufficient condition for asymptotic bootstrap validity hall2013bootstrap,bickel1981some.4 Conclusion
In this note, we demonstrate that bootstrap uncertainty estimates are asymptotically valid for the MLE in the MallowsBinomial distribution. Our work only proves coordinatewise, local asymptotic normality of the vectorvalued MLE. This result technically guarantees only asymptotically accurate marginal coverage. As a result, the confidence intervals may suffer from either overcoverage or undercoverage when applied jointly. To understand why, we can think of the marginal confidence intervals jointly creating a confidence region that is a dimensional hypercube, as opposed to a dimensional confidence ellipse that could be created via a true joint analysis. Overcoverage may occur if the confidence hypercube contains as a subset the (theoretical) confidence ellipse. On the other hand, if each coordinate of the confidence hypercube is independent, joint coverage becomes a multiple testing problem and may result in undercoverage. That said, the present results do not preclude the possibility that bootstrap uncertainty estimates provide asymptotically correct coverage in the joint setting. We find no evidence to suggest asymptotically correct joint intervals are invalid. Further research may demonstrate proper coverage in the joint setting by establishing joint asymptotic normality of the dimensional MLE in a local neighborhood of .
The present work does not address the computational challenges of frequentist estimation of the MallowsBinomial model, which is an NPhard problem pearce2022unified,Meila2012. As a result, forming an appropriate empirical distribution of the MLE via repeated estimation of bootstrap samples may be intractably slow. Bayesian estimation presents a natural alternative for estimating uncertainty in a unified framework, which may ultimately speed up the process of parameter estimation and inference and make moot the question of bootstrap validity.
Acknowledgements
The authors would like to thank YenChi Chen for his helpful feedback and advice while assembling this work. This work was funded by NSF Grant No. 2019901.