1 Introduction
Suppose a group of four friends want to choose one of the four restaurants for dinner. The first person ranks all four restaurants as , where means that “ is strictly preferred to ”. The second person says “ and are my top two choices, among which I prefer to ”. The third person ranks but has no idea about . The fourth person has no idea about , and would choose among . How should they aggregate their preferences to choose the best restaurant?
Similar rank aggregation problems exist in social choice, crowdsourcing Mao et al. (2013); Chen et al. (2013), recommender systems Candès and Recht (2009); Baltrunas et al. (2010); Keshavan et al. (2010); Negahban and Wainwright (2012), information retrieval Altman and Tennenholtz (2005); Liu (2011)
, etc. Rank aggregation can be cast as the following statistical parameter estimation problem: given a statistical model for rank data and the agents’ preferences, the parameter of the model is estimated to make decisions. Among the most widelyapplied statistical models for rank aggregation are the PlackettLuce model
Luce (1959); Plackett (1975) and its mixtures Gormley and Murphy (2008, 2009); Liu (2011); Mollica and Tardella (2017); Tkachenko and Lauw (2016); Mollica and Tardella (2017). In a PlackettLuce model over a set of alternatives, each alternative is parameterized by a strictly positive number that represents its probability to be ranked higher than other alternatives. A mixture of
PlackettLuce models, denoted by PL, combines component PlackettLuce models via the mixing coefficients with , such that for any , with probability , a data point is generated from the th PlackettLuce component.One critical limitation of PlackettLuce model and its mixtures is that their sample space consists of linear orders over . In other words, each data point must be a full ranking of all alternatives in . However, this is rarely the case in practice, because agents are often not able to rank all alternatives due to lack of information Pini et al. (2011), as illustrated in the example in the beginning of Introduction.
In general, each rank datum is a partial order, which can be seen as a collection of pairwise comparisons among alternatives that satisfy transitivity. However, handling partial orders is more challenging than it appears. In particular, the pairwise comparisons of the same agent cannot be seen as independently generated due to transitivity.
Consequently, most previous works focused on structured partial orders, where agents’ preferences share some common structures. For example, given , in rankedtop preferences Mollica and Tardella (2017); Huang et al. (2011), agents submit a linear order over their top choices; in way preferences Marden (1995); Hunter (2004); Maystre and Grossglauser (2015), agents submit a linear order over a set of alternatives, which are not necessarily their top alternatives; in choice preferences (a.k.a. choice sets) Train (2009), agents only specify their top choice among a set of alternatives. In particular, pairwise comparisons can be seen as way preferences or choice preferences.
However, as far as we know, most previous works assumed that the rank data share the same structure for their algorithms and theoretical guarantees to apply. It is unclear how rank aggregation can be done effectively and efficiently from structured partial orders of different kinds, as in the example in the beginning of Introduction. This is the key question we address in this paper.
How can we effectively and efficiently learn PlackettLuce and its mixtures from structured partial orders of different kinds?
Successfully addressing this question faces two challenges. First, to address the effectiveness concern, we need a statistical model that combines various structured partial orders to prove desirable statistical properties, and we are unaware of an existing one. Second, to address the efficiency concern, we need to design new algorithms as either previous algorithms cannot be directly applied, or it is unclear whether the theoretical guarantee such as consistency will be retained.
1.1 Our Contributions
Our contributions in addressing the key question are threefold.
Modeling Contributions. We propose a class of statistical models to model the coexistence of the following three types of structured partial orders mentioned in the Introduction: rankedtop, way, and choice, by leveraging mixtures of PlackettLuce models. Our models can be easily generalized to include other types of structured partial orders.
Theoretical Contributions. Our main theoretical results characterize the identifiability of the proposed models. Identifiability is fundamental in parameter estimation, which states that different parameters of the model should give different distributions over data. Clearly, if a model is nonidentifiable, then no parameter estimation algorithm can be consistent.
We prove that when only ranked top and way ( is set to if there are no way orders) orders are available, the mixture of PlackettLuce models is not identifiable if (Theorem 1). We also prove that the mixtures of two PlackettLuce models is identifiable under the following combinations of structures: ranked top (Theorem 2 (a) extended from Zhao et al. (2016)), ranked top plus way (Theorem 2 (b)), choice (Theorem 2 (c)), and 4way (Theorem 2 (d)). For the case of mixtures of PlackettLuce models over alternatives, we prove that if there exist s.t. the mixture of PlackettLuce models over alternatives is identifiable, we can learn the parameter using ranked top and way orders where (Theorem 3). This theorem, combined with Theorem 3 in Zhao et al. (2016), which provides a condition for mixtures of PlackettLuce models to be generically identifiable, can guide the algorithm design for mixtures of arbitrary PlackettLuce models.
Algorithmic Contributions. We propose efficient generalizedmethodofmoments (GMM) algorithms for parameter estimation of the proposed model based on PL. Our algorithm runs much faster while providing better statistical efficiency than the EMalgorithm proposed by Liu et al. (2019) on datasets with large numbers of structured partial orders, see Section 6 for more details. Our algorithms are compared with the GMM algorithm by Zhao et al. (2016) under two different settings. When full rankings are available, our algorithms outperform the GMM algorithm by Zhao et al. (2016) in terms of MSE. When only structured partial orders are available, the GMM algorithm by Zhao et al. (2016) is the best. We believe this difference is caused by the intrinsic information in the data.
1.2 Related Work and Discussions
Modeling. We are not aware of a previous model targeting rank data that consists of different types of structured partial orders. We believe that modeling the coexistence of different types of structured partial orders is highly important and practical, as it is more convenient, efficient, and accurate for an agent to report her preferences as a structured partial order of her choice. For example, some voting websites allow users to use different UIs to submit structured partial orders Brandt and Geist (2015).
There are two major lines of research in rank aggregation from partial orders: learning from structured partial orders and EM algorithms for general partial orders. Popular structured partial orders investigated in the literature are pairwise comparisons Jang et al. (2016); Jamieson and Nowak (2011), top Mollica and Tardella (2017); Huang et al. (2011), way Marden (1995); Hunter (2004); Maystre and Grossglauser (2015), and choice Train (2009). Khetan and Oh (2016) focused on partial orders with “separators", which is a broader class of partial orders than top. But still, Khetan and Oh (2016) assumes the same structure for everyone. Our model is more general as it allows the coexistence of different types of structured partial orders in the dataset. EM algorithms have been designed for learning mixtures of Mallows’ model Lu and Boutilier (2014) and mixtures of random utility models including the PlackettLuce model Liu et al. (2019), from general partial orders. Our model is less general, but as EM algorithms are often slow and it is unclear whether they are consistent, our model allows for theoretically and practically more efficient algorithms. We believe that our approach provides a principled balance between the flexibility of modeling and the efficiency of algorithms.
Theoretical results. Several previous works provided theoretical guarantees such as identifiability and sample complexity of mixtures of PlackettLuce models and their extensions to structured partial orders. For linear orders, Zhao et al. (2016) proved that the mixture of PlackettLuce models over alternatives is not identifiable when and this bound is tight for . We extend their results to the case of structured partial orders of various types. Ammar et al. (2014, Theorem 1) proved that when , where is a nonnegative integer power of , there exist two different mixtures of PlackettLuce models parameters that have the same distribution over way orders. Our Theorem 1 significantly extends this result in the following aspects: (i) our results includes all possible values of rather than powers of ; (ii) we show that the model is not identifiable even under way (in contrast to way) orders; (iii) we allow for combinations of ranked top and way structures. Oh and Shah (2014) showed that mixtures of PlackettLuce models are in general not identifiable given partial orders, but under some conditions on the data, the parameter can be learned using pairwise comparisons. We consider many more structures than pairwise comparisons.
Recently, Chierichetti et al. (2018) proved that at least random marginal probabilities of partial orders are required to identify the parameter of uniform mixture of two PlackettLuce models. We show that a carefully chosen set of marginal probabilities can be sufficient to identify the parameter of nonuniform mixtures of PlackettLuce models, which is a significant improvement. Further, our proposed algorithm can be easily modified to handle the case of uniform mixtures. Zhao et al. (2018b) characterized the conditions when mixtures of random utility models are generically identifiable. We focus on strict identifiability, which is stronger.
Algorithms.
Several learning algorithms for mixtures of PlackettLuce models have been proposed, including tensor decomposition based algorithm
Oh and Shah (2014), a polynomial system solving algorithm Chierichetti et al. (2018), a GMM algorithm Zhao et al. (2016), and EMbased algorithms Gormley and Murphy (2008); Tkachenko and Lauw (2016); Mollica and Tardella (2017); Liu et al. (2019). In particular, Liu et al. (2019) proposed an EMbased algorithm to learn from general partial orders. However, it is unclear whether their algorithm is consistent (as for most EM algorithms), and their algorithm is significantly slower than ours. Our algorithms for linear orders are similar to the one proposed by Zhao et al. (2016), but we consider different sets of marginal probabilities and our algorithms significantly outperforms the one by Zhao et al. (2016) w.r.t. MSE while taking similar running time.2 Preliminaries
Let denote a set of alternatives and denote the set of all linear orders (full rankings) over , which are antisymmetric, transitive and total binary relations. A linear order is denoted as , where is the most preferred alternative and is the least preferred alternative. A partial order is an antisymmetric and transitive binary relation. In this paper, we consider three types of strict partial orders: rankedtop (top for short), way, and choice, where . A top order is denoted by ; an way order is denoted by , which means that the agent does not have preferences over unranked alternatives; and a choice order is denoted by , where , , and , which means that the agent chooses from . We note that the three types of partial orders are not mutually exclusive. For example, a pairwise comparison is a way order as well as a choice order. Let denote the set of all partial orders of the three structures: ranked top, way, and choice () over . It is worth noting that . Let denote the data, also called a preference profile. Let denote a partial order over a subset whose structure is . When is top, is set to be . Let denote the set .
Definition 1.
(PlackettLuce model). The parameter space is . The sample space is . Given a parameter , the probability of any linear order is
Under PlackettLuce model, a partial order can be viewed as a marginal event which consists of all linear orders that extend , that is, for any extension , implies . The probabilities of the aforementioned three types of partial orders are as follows Xia (2019).

Top. For any top order , we have

way. For any way order , where , we have

Choice. For any choice order , we have
In this paper, we assume that data points are i.i.d. generated from the model.
Definition 2 (Mixtures of PlackettLuce models for linear orders (Pl)).
Given and , the sample space of PL is . The parameter space is , where is the mixing coefficients. For all , and . For all , is the parameter of the th PlackettLuce component. The probability of a linear order is:
We now recall the definition of identifiability of statistical models.
Definition 3 (Identifiability).
Let be a statistical model, where is the parameter space and is the distribution over the sample space associated with . is identifiable if for all , we have
A mixture model is generally not identifiable due to the label switching problem (Redner and Walker, 1984), which means that labeling the components differently leads to the same distribution over data. In this paper, we consider identifiability of mixture models modulo label switching. That is, in Definition 3, we further require that and cannot be obtained from each other by label switching.
3 Mixtures of PlackettLuce Models for Partial Orders
We propose the class of mixtures of PlackettLuce models for the aforementioned structures of partial orders. To this end, each such model should be described by the collection of allowable types of structured partial orders, denoted by . More precisely, is a set of structures , where for any , means structure over . For the case of top, is set to be . Since the three structured considered in this paper are not mutually exclusive, we require that does not include any pair of overlapping structures simultaneously for the model to be identifiable. There are two types of pairs of overlapping structures: (1) and ; and (2) for any subset of two alternatives , and . Each structure corresponds to a number and we require . A partial order is generated in two stages as illustrated in Figure 1: (i) a linear order is generated by PL given ; (ii) with probability , is projected to the randomlygenerated partial order structure , to obtain a partial order . Formally, the model is defined as follows.
Definition 4 (Mixtures of PlackettLuce models for partial orders by (Pl)).
Given , , and the set of structures , the sample space is all structured partial orders defined by . Given , the parameter space is
. The first part is a vector
, whose entries are all positive and . The second part is where for all , and . The remaining part is , where is the parameter of the th PlackettLuce component. Then the probability of any partial order , whose structure is defined by , isFor any partial order whose structure is , we can also write
(1) 
where is the marginal probability of under PL. This is a class of models because the sample space is different when is different.
Example 1.
Let the set of alternatives be . Consider the 2PL where . , , , , , , . Now we compute the probabilities of the following partial orders given the model: (top), (top2), (3way), and (choice3 over ). We first compute for all combinations of and , shown in Table 1.
Let denote the probability of under model , we have
4 (Non)identifiability of Pl
Let and . Given a set of partial orders , we denote a column vector of probabilities of each partial order in for a PlackettLuce component with parameter by . Given , we define a matrix , which is heavily used in the proofs of this paper, by . The following theorem shows that under some conditions on , , and , PL is not identifiable.
Theorem 1.
Given a set of alternatives and any , . Let . Given any , and for any , PL is not identifiable.
Proof.
It suffices to prove that the theorem holds when . Given , it suffices to prove that the model is not identifiable even if the parameter is unique given the distribution of data.
The proof is constructive. By Lemma 1 of Zhao et al. (2016), for any and , we only need to find and such that (1) , where consists of all ranked top and way orders, and (2) has positive elements and negative elements.
We consider the case where the parameter for first alternative of th component is , where . All other alternatives have the same parameters .
Table 2 lists some probabilities (constant factors may be omitted). We can see the probabilities from the two classes have similar structures.
top  

second  
at position  
not in top  
way top  
way second  
way at position  
way at position 
It is not hard to check that the probability for to be ranked at the th position in the th component is
(2) 
where . The probability for to be ranked out of top position is .
And the probability for to be ranked at the th position in the th component for way rankings is
(3) 
where .
Then can be reduced to a matrix. We now define a new matrix obtained from by performing the following linear operations on row vectors. (i) Make the first row of to be ; (ii) for any , the th row of is the probability for to be ranked at the th position according to (2); (iii) for any , the th row of is the probability for to be ranked at the th position in an way order according to (3) ; (iv) the th row is the probability that is not ranked within top ; (v) remove all constant factors.
More precisely, for any we define the following function.
Then we define .
For any , let
(4) 
Note that the numerator of is always positive. W.l.o.g. let , then half of the denominators are positive and the other half are negative. Note that the degree of the numerator of is . By Lemma 6 of Zhao et al. (2016), we have . ∎
Considering that any way order implies a choice order, we have the following corollary.
Corollary 1.
Given a set of alternatives and any , . Let . Given any , and for any , PL is not identifiable.
Given any , these results show what structures of data we cannot use if we want to interpret the learned parameter. Next, we will characterize conditions for 2PL’s to be identifiable.
Theorem 2.
Let be one of the four combinations of structures below. For any , 2PL over alternatives is identifiable.
(a) ,
(b) ,
(c) , or
(d) .
Proof.
The proof has two steps. The first step is the same across (a), (b), (c), and (d). We show that for any PL with any parameter , there does not exist s.t. for any the distribution over the sample space is exactly the same. For the purpose of contradiction suppose such exists. Since , there exist a structure s.t. . Now we consider the total probability of all possible partial orders of this structure, denoted by . Then we have
which is a contradiction.
In the second step, we show that for any PLwith any parameter , there does not exist s.t. for any . We will prove for each of the cases (a), (b), (c), and (d).
(a) This step for (a) is exactly the same as the proof for (Zhao et al., 2016, Theorem 2).
(b) We focus on . The case for is very similar. Let consist of all ranked top and way orders ( marginal probabilities). We will show that for all nondegenerate , rank. Then this part is proved by applying (Zhao et al., 2016, Lemma 1).
For simplicity we use to denote the parameter of th PlackettLuce model for respectively, i.e.,
We define and the following row vectors.
We have . Therefore, if there exist three ’s such that and are linearly independent, then . The proof is done. Because is nondegenerate, at least one of is linearly independent of . W.l.o.g. suppose is linearly independent of . This means that not all of are equal. Following Zhao et al. (2016), we prove the theorem in the following two cases.
Case 1. , , and are all linear combinations of and .
Case 2. There exists a (where ) that is linearly independent of and .
Case 2 was proved by Zhao et al. (2016) using only ranked top orders, as well as most of Case 1. The only remaining case is as follows. For all ,
(5) 
We first show a claim, which is useful to the proof.
Claim 1.
Under the settings of (5), and there exists in s.t. .
Proof.
If , then , which is a contradiction. Since and , we have . If (or ), then (or ), which means parameters corresponds to all other alternatives are zero or negative. This is a contradiction. ∎
So if , we switch the role of and . Then we have .
In this case, we construct in the following way.
Moments  
