# Learning Mixtures of Plackett-Luce Models from Structured Partial Orders

Mixtures of ranking models have been widely used for heterogeneous preferences. However, learning a mixture model is highly nontrivial, especially when the dataset consists of partial orders. In such cases, the parameter of the model may not be even identifiable. In this paper, we focus on three popular structures of partial orders: ranked top-l_1, l_2-way, and choice data over a subset of alternatives. We prove that when the dataset consists of combinations of ranked top-l_1 and l_2-way (or choice data over up to l_2 alternatives), mixture of k Plackett-Luce models is not identifiable when l_1+l_2< 2k-1 (l_2 is set to 1 when there are no l_2-way orders). We also prove that under some combinations, including ranked top-3, ranked top-2 plus 2-way, and choice data over up to 4 alternatives, mixtures of two Plackett-Luce models are identifiable. Guided by our theoretical results, we propose efficient generalized method of moments (GMM) algorithms to learn mixtures of two Plackett-Luce models, which are proven consistent. Our experiments demonstrate the efficacy of our algorithms. Moreover, we show that when full rankings are available, learning from different marginal events (partial orders) provides tradeoffs between statistical efficiency and computational efficiency.

• 7 publications
• 52 publications
06/06/2020

### Learning Mixtures of Plackett-Luce Models with Features from Top-l Orders

Plackett-Luce model (PL) is one of the most popular models for preferenc...
01/31/2022

### On the identifiability of mixtures of ranking models

Mixtures of ranking models are standard tools for ranking problems. Howe...
12/31/2021

### Fast Learning of MNL Model from General Partial Rankings with Application to Network Formation Modeling

Multinomial Logit (MNL) is one of the most popular discrete choice model...
12/02/2011

### Label Ranking with Abstention: Predicting Partial Orders by Thresholding Probability Distributions (Extended Abstract)

We consider an extension of the setting of label ranking, in which the l...
08/03/2021

### Bayesian I-optimal designs for choice experiments with mixtures

Discrete choice experiments are frequently used to quantify consumer pre...
09/20/2022

### Efficient and accurate inference for mixtures of Mallows models with Spearman distance

The Mallows model occupies a central role in parametric modelling of ran...
04/18/2020

### Predicting Online Item-choice Behavior: A Shape-restricted Regression Perspective

This paper is concerned with examining the relationship between users' p...

## 1 Introduction

Suppose a group of four friends want to choose one of the four restaurants for dinner. The first person ranks all four restaurants as , where means that “ is strictly preferred to ”. The second person says “ and are my top two choices, among which I prefer to ”. The third person ranks but has no idea about . The fourth person has no idea about , and would choose among . How should they aggregate their preferences to choose the best restaurant?

Similar rank aggregation problems exist in social choice, crowdsourcing Mao et al. (2013); Chen et al. (2013), recommender systems Candès and Recht (2009); Baltrunas et al. (2010); Keshavan et al. (2010); Negahban and Wainwright (2012), information retrieval Altman and Tennenholtz (2005); Liu (2011)

, etc. Rank aggregation can be cast as the following statistical parameter estimation problem: given a statistical model for rank data and the agents’ preferences, the parameter of the model is estimated to make decisions. Among the most widely-applied statistical models for rank aggregation are the Plackett-Luce model

Luce (1959); Plackett (1975) and its mixtures Gormley and Murphy (2008, 2009); Liu (2011); Mollica and Tardella (2017); Tkachenko and Lauw (2016); Mollica and Tardella (2017). In a Plackett-Luce model over a set of alternatives

, each alternative is parameterized by a strictly positive number that represents its probability to be ranked higher than other alternatives. A mixture of

Plackett-Luce models, denoted by -PL, combines component Plackett-Luce models via the mixing coefficients with , such that for any , with probability , a data point is generated from the -th Plackett-Luce component.

One critical limitation of Plackett-Luce model and its mixtures is that their sample space consists of linear orders over . In other words, each data point must be a full ranking of all alternatives in . However, this is rarely the case in practice, because agents are often not able to rank all alternatives due to lack of information Pini et al. (2011), as illustrated in the example in the beginning of Introduction.

In general, each rank datum is a partial order, which can be seen as a collection of pairwise comparisons among alternatives that satisfy transitivity. However, handling partial orders is more challenging than it appears. In particular, the pairwise comparisons of the same agent cannot be seen as independently generated due to transitivity.

Consequently, most previous works focused on structured partial orders, where agents’ preferences share some common structures. For example, given , in ranked-top- preferences Mollica and Tardella (2017); Huang et al. (2011), agents submit a linear order over their top choices; in -way preferences Marden (1995); Hunter (2004); Maystre and Grossglauser (2015), agents submit a linear order over a set of alternatives, which are not necessarily their top alternatives; in choice- preferences (a.k.a. choice sets) Train (2009), agents only specify their top choice among a set of alternatives. In particular, pairwise comparisons can be seen as -way preferences or choice- preferences.

However, as far as we know, most previous works assumed that the rank data share the same structure for their algorithms and theoretical guarantees to apply. It is unclear how rank aggregation can be done effectively and efficiently from structured partial orders of different kinds, as in the example in the beginning of Introduction. This is the key question we address in this paper.

How can we effectively and efficiently learn Plackett-Luce and its mixtures from structured partial orders of different kinds?

Successfully addressing this question faces two challenges. First, to address the effectiveness concern, we need a statistical model that combines various structured partial orders to prove desirable statistical properties, and we are unaware of an existing one. Second, to address the efficiency concern, we need to design new algorithms as either previous algorithms cannot be directly applied, or it is unclear whether the theoretical guarantee such as consistency will be retained.

### 1.1 Our Contributions

Our contributions in addressing the key question are three-fold.

Modeling Contributions. We propose a class of statistical models to model the co-existence of the following three types of structured partial orders mentioned in the Introduction: ranked-top-, -way, and choice-, by leveraging mixtures of Plackett-Luce models. Our models can be easily generalized to include other types of structured partial orders.

Theoretical Contributions. Our main theoretical results characterize the identifiability of the proposed models. Identifiability is fundamental in parameter estimation, which states that different parameters of the model should give different distributions over data. Clearly, if a model is non-identifiable, then no parameter estimation algorithm can be consistent.

We prove that when only ranked top- and -way ( is set to if there are no -way orders) orders are available, the mixture of Plackett-Luce models is not identifiable if (Theorem 1). We also prove that the mixtures of two Plackett-Luce models is identifiable under the following combinations of structures: ranked top- (Theorem 2 (a) extended from Zhao et al. (2016)), ranked top- plus way (Theorem 2 (b)), choice- (Theorem 2 (c)), and 4-way (Theorem 2 (d)). For the case of mixtures of Plackett-Luce models over alternatives, we prove that if there exist s.t. the mixture of Plackett-Luce models over alternatives is identifiable, we can learn the parameter using ranked top- and -way orders where (Theorem 3). This theorem, combined with Theorem 3 in Zhao et al. (2016), which provides a condition for mixtures of Plackett-Luce models to be generically identifiable, can guide the algorithm design for mixtures of arbitrary Plackett-Luce models.

Algorithmic Contributions. We propose efficient generalized-method-of-moments (GMM) algorithms for parameter estimation of the proposed model based on -PL. Our algorithm runs much faster while providing better statistical efficiency than the EM-algorithm proposed by Liu et al. (2019) on datasets with large numbers of structured partial orders, see Section 6 for more details. Our algorithms are compared with the GMM algorithm by Zhao et al. (2016) under two different settings. When full rankings are available, our algorithms outperform the GMM algorithm by Zhao et al. (2016) in terms of MSE. When only structured partial orders are available, the GMM algorithm by Zhao et al. (2016) is the best. We believe this difference is caused by the intrinsic information in the data.

### 1.2 Related Work and Discussions

Modeling. We are not aware of a previous model targeting rank data that consists of different types of structured partial orders. We believe that modeling the coexistence of different types of structured partial orders is highly important and practical, as it is more convenient, efficient, and accurate for an agent to report her preferences as a structured partial order of her choice. For example, some voting websites allow users to use different UIs to submit structured partial orders Brandt and Geist (2015).

There are two major lines of research in rank aggregation from partial orders: learning from structured partial orders and EM algorithms for general partial orders. Popular structured partial orders investigated in the literature are pairwise comparisons Jang et al. (2016); Jamieson and Nowak (2011), top- Mollica and Tardella (2017); Huang et al. (2011), -way Marden (1995); Hunter (2004); Maystre and Grossglauser (2015), and choice- Train (2009). Khetan and Oh (2016) focused on partial orders with “separators", which is a broader class of partial orders than top-. But still, Khetan and Oh (2016) assumes the same structure for everyone. Our model is more general as it allows the coexistence of different types of structured partial orders in the dataset. EM algorithms have been designed for learning mixtures of Mallows’ model Lu and Boutilier (2014) and mixtures of random utility models including the Plackett-Luce model Liu et al. (2019), from general partial orders. Our model is less general, but as EM algorithms are often slow and it is unclear whether they are consistent, our model allows for theoretically and practically more efficient algorithms. We believe that our approach provides a principled balance between the flexibility of modeling and the efficiency of algorithms.

Theoretical results. Several previous works provided theoretical guarantees such as identifiability and sample complexity of mixtures of Plackett-Luce models and their extensions to structured partial orders. For linear orders, Zhao et al. (2016) proved that the mixture of Plackett-Luce models over alternatives is not identifiable when and this bound is tight for . We extend their results to the case of structured partial orders of various types. Ammar et al. (2014, Theorem 1) proved that when , where is a nonnegative integer power of , there exist two different mixtures of Plackett-Luce models parameters that have the same distribution over -way orders. Our Theorem 1 significantly extends this result in the following aspects: (i) our results includes all possible values of rather than powers of ; (ii) we show that the model is not identifiable even under -way (in contrast to -way) orders; (iii) we allow for combinations of ranked top- and -way structures. Oh and Shah (2014) showed that mixtures of Plackett-Luce models are in general not identifiable given partial orders, but under some conditions on the data, the parameter can be learned using pairwise comparisons. We consider many more structures than pairwise comparisons.

Recently, Chierichetti et al. (2018) proved that at least random marginal probabilities of partial orders are required to identify the parameter of uniform mixture of two Plackett-Luce models. We show that a carefully chosen set of marginal probabilities can be sufficient to identify the parameter of nonuniform mixtures of Plackett-Luce models, which is a significant improvement. Further, our proposed algorithm can be easily modified to handle the case of uniform mixtures. Zhao et al. (2018b) characterized the conditions when mixtures of random utility models are generically identifiable. We focus on strict identifiability, which is stronger.

Algorithms.

Several learning algorithms for mixtures of Plackett-Luce models have been proposed, including tensor decomposition based algorithm

Oh and Shah (2014), a polynomial system solving algorithm Chierichetti et al. (2018), a GMM algorithm Zhao et al. (2016), and EM-based algorithms Gormley and Murphy (2008); Tkachenko and Lauw (2016); Mollica and Tardella (2017); Liu et al. (2019). In particular, Liu et al. (2019) proposed an EM-based algorithm to learn from general partial orders. However, it is unclear whether their algorithm is consistent (as for most EM algorithms), and their algorithm is significantly slower than ours. Our algorithms for linear orders are similar to the one proposed by Zhao et al. (2016), but we consider different sets of marginal probabilities and our algorithms significantly outperforms the one by Zhao et al. (2016) w.r.t. MSE while taking similar running time.

## 2 Preliminaries

Let denote a set of alternatives and denote the set of all linear orders (full rankings) over , which are antisymmetric, transitive and total binary relations. A linear order is denoted as , where is the most preferred alternative and is the least preferred alternative. A partial order is an antisymmetric and transitive binary relation. In this paper, we consider three types of strict partial orders: ranked-top- (top- for short), -way, and choice-, where . A top- order is denoted by ; an -way order is denoted by , which means that the agent does not have preferences over unranked alternatives; and a choice- order is denoted by , where , , and , which means that the agent chooses from . We note that the three types of partial orders are not mutually exclusive. For example, a pairwise comparison is a -way order as well as a choice- order. Let denote the set of all partial orders of the three structures: ranked top-, -way, and choice- () over . It is worth noting that . Let denote the data, also called a preference profile. Let denote a partial order over a subset whose structure is . When is top-, is set to be . Let denote the set .

###### Definition 1.

(Plackett-Luce model). The parameter space is . The sample space is . Given a parameter , the probability of any linear order is

 PrPL(R|→θ)=m−1∏p=1θip∑mq=pθiq.

Under Plackett-Luce model, a partial order can be viewed as a marginal event which consists of all linear orders that extend , that is, for any extension , implies . The probabilities of the aforementioned three types of partial orders are as follows Xia (2019).

• Top-. For any top- order , we have

 PrPL(Otop-l|→θ)=l∏p=1θip∑mq=pθiq.
• -way. For any -way order , where , we have

 PrPL(Ol-wayA′|→θ)=l−1∏p=1θip∑lq=pθiq.
• Choice-. For any choice order , we have

 PrPL(O|→θ)=θi∑aj∈A′θj.

In this paper, we assume that data points are i.i.d. generated from the model.

###### Definition 2 (Mixtures of k Plackett-Luce models for linear orders (k-Pl)).

Given and , the sample space of -PL is . The parameter space is , where is the mixing coefficients. For all , and . For all , is the parameter of the th Plackett-Luce component. The probability of a linear order is:

 Prk-PL(R|→θ)=k∑r=1αrPrPL(R|→θ(r)).

We now recall the definition of identifiability of statistical models.

###### Definition 3 (Identifiability).

Let be a statistical model, where is the parameter space and is the distribution over the sample space associated with . is identifiable if for all , we have

 Pr(⋅|→θ)=Pr(⋅|→γ)⟹→θ=→γ.

A mixture model is generally not identifiable due to the label switching problem (Redner and Walker, 1984), which means that labeling the components differently leads to the same distribution over data. In this paper, we consider identifiability of mixture models modulo label switching. That is, in Definition 3, we further require that and cannot be obtained from each other by label switching.

## 3 Mixtures of Plackett-Luce Models for Partial Orders

We propose the class of mixtures of Plackett-Luce models for the aforementioned structures of partial orders. To this end, each such model should be described by the collection of allowable types of structured partial orders, denoted by . More precisely, is a set of structures , where for any , means structure over . For the case of top-, is set to be . Since the three structured considered in this paper are not mutually exclusive, we require that does not include any pair of overlapping structures simultaneously for the model to be identifiable. There are two types of pairs of overlapping structures: (1) and ; and (2) for any subset of two alternatives , and . Each structure corresponds to a number and we require . A partial order is generated in two stages as illustrated in Figure 1: (i) a linear order is generated by -PL given ; (ii) with probability , is projected to the randomly-generated partial order structure , to obtain a partial order . Formally, the model is defined as follows.

###### Definition 4 (Mixtures of k Plackett-Luce models for partial orders by Φ (k-Pl-Φ)).

Given , , and the set of structures , the sample space is all structured partial orders defined by . Given , the parameter space is

. The first part is a vector

, whose entries are all positive and . The second part is where for all , and . The remaining part is , where is the parameter of the th Plackett-Luce component. Then the probability of any partial order , whose structure is defined by , is

 Prk-PL-Φ(O|→θ)=ϕsA′k∑r=1αrPrPL(OsA′|→θ(r)).

For any partial order whose structure is , we can also write

 Prk-PL-Φ(O|→θ)=ϕsA′Prk-PL(O|→θ) (1)

where is the marginal probability of under -PL. This is a class of models because the sample space is different when is different.

###### Example 1.

Let the set of alternatives be . Consider the 2-PL- where . , , , , , , . Now we compute the probabilities of the following partial orders given the model: (top-), (top-2), (3-way), and (choice-3 over ). We first compute for all combinations of and , shown in Table 1.

Let denote the probability of under model , we have

 PrM(O1) =ϕtop-3A2∑r=1αrPr(O1|→θ(r))=0.2×(0.2×0.06+0.8×0.045)=0.0096 PrM(O2) =ϕtop-2A2∑r=1αrPr(O2|→θ(r))=0.1×(0.2×0.2+0.8×0.13)=0.014 PrM(O3) =ϕ2-way{a3,a4}2∑r=1αrPr(O3|→θ(r))=0.3×(0.2×0.3+0.8×0.225)=0.072 PrM(O4) =ϕchoice-3{a1,a2,a3}2∑r=1αrPr(O4|→θ(r))=0.4×(0.2×0.5+0.8×0.43)=0.18

## 4 (Non-)identifiability of k-Pl-Φ

Let and . Given a set of partial orders , we denote a column vector of probabilities of each partial order in for a Plackett-Luce component with parameter by . Given , we define a matrix , which is heavily used in the proofs of this paper, by . The following theorem shows that under some conditions on , , and , -PL- is not identifiable.

###### Theorem 1.

Given a set of alternatives and any , . Let . Given any , and for any , -PL- is not identifiable.

###### Proof.

It suffices to prove that the theorem holds when . Given , it suffices to prove that the model is not identifiable even if the parameter is unique given the distribution of data.

The proof is constructive. By Lemma 1 of Zhao et al. (2016), for any and , we only need to find and such that (1) , where consists of all ranked top- and -way orders, and (2) has positive elements and negative elements.

We consider the case where the parameter for first alternative of -th component is , where . All other alternatives have the same parameters .

Table 2 lists some probabilities (constant factors may be omitted). We can see the probabilities from the two classes have similar structures.

It is not hard to check that the probability for to be ranked at the -th position in the -th component is

 (m−1)!(m−i)!er(br)i−1∏i−1p=0(1−pbr) (2)

where . The probability for to be ranked out of top position is .

And the probability for to be ranked at the -th position in the -th component for -way rankings is

 (l2−1)!(l2−i)!er(br)i−1∏l2−1p=l2−i(er+pbr) (3)

where .

Then can be reduced to a matrix. We now define a new matrix obtained from by performing the following linear operations on row vectors. (i) Make the first row of to be ; (ii) for any , the -th row of is the probability for to be ranked at the -th position according to (2); (iii) for any , the -th row of is the probability for to be ranked at the -th position in an -way order according to (3) ; (iv) the th row is the probability that is not ranked within top ; (v) remove all constant factors.

More precisely, for any we define the following function.

 →f∗E(er)=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝1erer(1−er)er+m−2⋮er(1−er)l1−1∏l1−1p=1(per+m−1−p)(1−er)l1∏l1−1p=1(per+m−1−p)er(m−l2)er+(l2−1)⋮er(1−er)l2−2∏l2−2p=0((m−l2+p)er+(l2−1−p))(1−er)l∏l−1p=1(per+m−1−p)⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠

Then we define .

For any , let

 β∗r=∏l1−1p=1(per+m−1−p)∏l2−2p=0((m−l2+p)er+l2−1−p)∏q≠r(er−eq) (4)

Note that the numerator of is always positive. W.l.o.g. let , then half of the denominators are positive and the other half are negative. Note that the degree of the numerator of is . By Lemma 6 of Zhao et al. (2016), we have . ∎

Considering that any -way order implies a choice- order, we have the following corollary.

###### Corollary 1.

Given a set of alternatives and any , . Let . Given any , and for any , -PL- is not identifiable.

Given any , these results show what structures of data we cannot use if we want to interpret the learned parameter. Next, we will characterize conditions for 2-PL-’s to be identifiable.

###### Theorem 2.

Let be one of the four combinations of structures below. For any , 2-PL- over alternatives is identifiable.
(a) , (b) , (c) , or (d) .

###### Proof.

The proof has two steps. The first step is the same across (a), (b), (c), and (d). We show that for any -PL- with any parameter , there does not exist s.t. for any the distribution over the sample space is exactly the same. For the purpose of contradiction suppose such exists. Since , there exist a structure s.t. . Now we consider the total probability of all possible partial orders of this structure, denoted by . Then we have

 w∑j=1Prk-PL-Φ(Oj|→θ)=ϕsAs≠ϕ′sAs=w∑j=1Prk-PL-Φ(Oj|→θ′),

In the second step, we show that for any -PL-with any parameter , there does not exist s.t. for any . We will prove for each of the cases (a), (b), (c), and (d).

(a) This step for (a) is exactly the same as the proof for (Zhao et al., 2016, Theorem 2).

(b) We focus on . The case for is very similar. Let consist of all ranked top- and -way orders ( marginal probabilities). We will show that for all non-degenerate , rank. Then this part is proved by applying (Zhao et al., 2016, Lemma 1).

For simplicity we use to denote the parameter of th Plackett-Luce model for respectively, i.e.,

 [→θ(1)→θ(2)→θ(3)→θ(4)]=⎡⎢ ⎢ ⎢⎣e1e2e3e4b1b2b3b4c1c2c3c4d1d2d3d4⎤⎥ ⎥ ⎥⎦

We define and the following row vectors.

 →1 =[1,1,1,1] →ω(1) =[e1,e2,e3,e4] →ω(2) =[b1,b2,b3,d3] →ω(3) =[c1,c2,c3,c4] →ω(4) =[d1,d2,d3,d4]

We have . Therefore, if there exist three ’s such that and are linearly independent, then . The proof is done. Because is non-degenerate, at least one of is linearly independent of . W.l.o.g. suppose is linearly independent of . This means that not all of are equal. Following Zhao et al. (2016), we prove the theorem in the following two cases.

Case 1. , , and are all linear combinations of and .
Case 2. There exists a (where ) that is linearly independent of and .

Case 2 was proved by Zhao et al. (2016) using only ranked top- orders, as well as most of Case 1. The only remaining case is as follows. For all ,

 →θ(r)=⎡⎢ ⎢ ⎢⎣erbrcrdr⎤⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢⎣erp2er−p2p3er−p3−(1+p2+p3)er+(1+p2+p3)⎤⎥ ⎥ ⎥ ⎥⎦ (5)

We first show a claim, which is useful to the proof.

###### Claim 1.

Under the settings of (5), and there exists in s.t. .

###### Proof.

If , then , which is a contradiction. Since and , we have . If (or ), then (or ), which means parameters corresponds to all other alternatives are zero or negative. This is a contradiction. ∎

So if , we switch the role of and . Then we have .

In this case, we construct in the following way.