Log In Sign Up

Label Ranking with Abstention: Predicting Partial Orders by Thresholding Probability Distributions (Extended Abstract)

We consider an extension of the setting of label ranking, in which the learner is allowed to make predictions in the form of partial instead of total orders. Predictions of that kind are interpreted as a partial abstention: If the learner is not sufficiently certain regarding the relative order of two alternatives, it may abstain from this decision and instead declare these alternatives as being incomparable. We propose a new method for learning to predict partial orders that improves on an existing approach, both theoretically and empirically. Our method is based on the idea of thresholding the probabilities of pairwise preferences between labels as induced by a predicted (parameterized) probability distribution on the set of all rankings.


Skeptical inferences in multi-label ranking with sets of probabilities

In this paper, we consider the problem of making skeptical inferences fo...

Reliable Multi-label Classification: Prediction with Partial Abstention

In contrast to conventional (single-label) classification, the setting o...

Learning Mixtures of Plackett-Luce Models from Structured Partial Orders

Mixtures of ranking models have been widely used for heterogeneous prefe...

Credal Self-Supervised Learning

Self-training is an effective approach to semi-supervised learning. The ...

Efficient Probabilistic Inference with Partial Ranking Queries

Distributions over rankings are used to model data in various settings s...

Trustworthy Preference Completion in Social Choice

As from time to time it is impractical to ask agents to provide linear o...

Belief propagation for permutations, rankings, and partial orders

Many datasets give partial information about an ordering or ranking by i...

1 Introduction

In the setting of label ranking, a special type of preference learning problem, each instance from an instance space is associated with a total order of a fixed set of class labels , that is, a complete, transitive, and asymmetric relation on , where indicates that, for instance , precedes in the order. Since a ranking can be considered as a special type of preference relation, we shall also say that indicates that is preferred to given the instance .

Formally, a total order can be identified with a permutation of the set , such that is the index of the class label on the -th position in the order (and hence the position of the -th label). This permutation thus encodes the (ground truth) order relation

We denote the class of permutations of (the symmetric group of order ) by .

The goal in label ranking is to learn a “label ranker” in the form of an mapping. As training data, a label ranker uses a set of instances (), together with preference information in the form of pairwise comparisons of some labels in , suggesting that instance prefers label to .

Motivated by the idea of a reject option in classification, the authors in [3] introduced a variant of the above setting in which the label ranker is allowed to partially abstain from a prediction. More specifically, it is allowed to make predictions in the form of partial instead of total orders: If the ranker is not sufficiently certain regarding the relative order of two alternatives and, therefore, cannot reliably decide whether the former should precede the latter or the other way around, it may abstain from this decision and instead declare these alternatives as being incomparable. Abstaining in a consistent way, it should of course still produce an asymmetric and transitive relation, hence a partial order.

The approach in [3], despite being the first to address the problem of learning to predict partial orders, still exhibits some disadvantages (see next section). In this paper, we therefore propose an alternative method, or rather a modification, which is based on the idea of predicting partial orders by thresholding parameterized probability distributions on rankings. Roughly speaking, by making stronger model assumptions, this approach is able to avoid inconsistencies that may occur in [3], and hence simplifies the construction of consistent partial order relations; see Section 3 for details.

Of course, despite being interesting from a theoretical point of view, these properties do not guarantee a practical advantage in terms of prediction performance, especially in cases where the model assumptions might be violated. Therefore, we complement our theoretical results by an experimental study in which we compare our new method with the original approach of [3].

2 Previous Work

The method in [3] consists of two main steps and can be considered as a pairwise approach in the sense that, as a point of departure, a valued preference relation is produced, where is interpreted as a measure of support of the pairwise preference . Support is commonly interpreted in terms of probability, hence is assumed to be reciprocal, that is, for all . Then, in a second step, a partial order is derived from via thresholding: if and otherwise, where is a threshold. Thus, the idea is to predict only those pairwise preferences that are sufficiently likely, while abstaining on pairs for which the probability is too close to .

The first step of deriving the relation is realized in [3] by means of an ensemble learning technique: Training an ensemble of standard label rankers, each of which provides a prediction in the form of a total order, is defined by the fraction of ensemble members voting for . Other possibilities are of course conceivable, and indeed, the only important point to notice here is that the preference degrees are essentially independent of each other. Or, stated differently, they do not guarantee any specific properties of the relation except being reciprocal. For the relation derived from via thresholding, this has two important consequences:

  • If the threshold is not large enough, then may have cycles. Thus, not all thresholds in are actually feasible. In particular, if cannot be chosen, this also implies that the method may not be able to predict a total order as a special case.

  • Even if does not have cycles, it is not guaranteed to be transitive.

To overcome these problems, the authors devise an algorithm that finds the smallest feasible threshold and “repairs” a non-transitive relation by replacing it with its transitive closure. The complexity of this algorithm is .

3 Predicting Partial Orders based on Probabilistic Models

In order to tackle the above problems, our idea is to restrict the relation so as to exclude the possibility of cycles and violations of transitivity from the very beginning. To this end, we take advantage of methods for label ranking that produce (parameterized) probability distributions over as predictions. Our main theoretical result is to show that thresholding pairwise preferences induced by such distributions yields preference relations with the desired properties, that is, partial order relations .

In [2], a label ranking method was proposed that produces predictions expressed in terms of the Mallows model [5], a distance-based

probability model belonging to the family of exponential distributions. The standard Mallows model


is determined by two parameters: The ranking is the location parameter (mode, center ranking) and is a spread parameter. Moreover, is a distance measure on rankings, and the constant is a normalization factor that depends on the spread (but, provided the right-invariance of , not on ). Obviously, the Mallows model assigns the maximum probability to the center ranking . The larger the distance , the smaller the probability of becomes. The spread parameter determines how quickly the probability decreases, i.e., how peaked the distribution is around . For

, the uniform distribution is obtained, while for

, the distribution converges to the one-point distribution that assigns probability 1 to and 0 to all other rankings.

Alternatively, the Plackett-Luce (PL) model was used in [1]

. This is a stagewise model, which is specified by a parameter vector



This model is a generalization of the well-known Bradley-Terry model for the pairwise comparison of alternatives, which specifies the probability that “ wins against ” in terms of . Obviously, the larger in comparison to , the higher the probability that is chosen. Likewise, the larger the parameter in (2) in comparison to the parameters , , the higher the probability that the label appears on a top rank. An intuitively appealing explanation of the PL model can be given in terms of a vase model: If corresponds to the relative frequency of the -th label in a vase filled with labeled balls, then is the probability to produce the ranking by randomly drawing balls from the vase in a sequential way and putting the label drawn in the -th trial on position (unless the label was already chosen before, in which case the trial is annulled).

Given a probability distribution on the set of rankings , the probability of a pairwise preference (and hence the corresponding entry in the preference relation ) can be derived through marginalization:


where denotes the set of linear extensions of the incomplete ranking , i.e., the set of all rankings in which precedes . Our main theoretical result states that thresholding (3) yields a proper partial order relation , both for the Mallows and the PL model.

Theorem 1.

Let in (3) be the Mallows model (1), with a distance having the so-called transposition property, or the PL model (2). Moreover, let be defined by the thresholded relation if and otherwise. Then defines a proper partial order relation for all .

A distance on rankings is said to have the transposition property, if the following holds: Let and be rankings so that, in both of them, precedes . Moreover, consider a third ranking identical to , except for a transposition of and . Then, . Of course, this property is intuitively plausible, and indeed, it is satisfied by most of the commonly used distance measures (see, e.g., [4]).

While the proof of the above theorem is rather straightforward for the PL model, it becomes less obvious in the case of the Mallows model. In any case, it guarantees that a proper partial order relation can be predicted by simple thresholding, and without the need for any further reparation. Moreover, the whole spectrum of threshold parameters can be used.

4 Experiments

As mentioned earlier, the alternative approach outlined above does not automatically imply a practical advantage, especially since it makes strong model assumptions (in terms of the Mallows or PL model) that are not necessarily satisfied. Therefore, we complement our theoretical results by an empirical study, in which we analyze the tradeoff between correctness and completeness achieved by different methods.

If a model is allowed to abstain from making predictions, it is expected to reduce its error rate. In fact, it can trivially do so, namely by rejecting all predictions, in which case it avoids any mistake. Clearly, this is not a desirable solution. Indeed, in the setting of prediction with reject option, there is always a trade-off between two criteria: correctness on the one side and completeness on the other side. An ideal learner is correct in the sense of making few mistakes, but also complete in the sense of abstaining rarely. The two criteria are conflicting: increasing completeness typically comes along with reducing correctness and vice versa, at least if the learner is effective in the sense that it abstains from those decisions that are indeed most uncertain.

As measures of correctness and completeness, we use those that were proposed in [3]. Correctness is measured by the gamma rank correlation (between the true ranking and the predicted partial order), and completeness is defined by one minus the (relative) number of pairwise comparisons on which the model abstains.

Figure 1: Trade-off between completeness and correctness for a label ranking variant of the UCI benchmark data set VOWEL: Existing pairwise method (solid line) versus new approach based on probabilistic models (dashed line).

The main conclusion that can be drawn from our results is that, as expected, our probabilistic approach does indeed achieve a better trade-off between completeness and correctness, especially in the sense that it spans a wider range of values for the former. Besides, we often observe that the level of correctness is increased, too. A typical example of the completeness/complexity trade-off is shown in Figure 1.


  • [1] Weiwei Cheng, Krzysztof Dembczyński, and Eyke Hüllermeier. Label ranking methods based on the Plackett-Luce model. In

    Proceedings of ICML 2010: International Conference on Machine Learning

    , pages 215–222. Omnipress, 2010.
  • [2] Weiwei Cheng, Jens Hühn, and Eyke Hüllermeier. Decision tree and instance-based learning for label ranking. In Proceedings of ICML 2009: International Conference on Machine Learning, pages 161–168. Omnipress, 2009.
  • [3] Weiwei Cheng, Michaël Rademaker, Bernard De Baets, and Eyke Hüllermeier. Predicting partial orders: Ranking with abstention. In Proceedings of ECMLPKDD 2010: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages 215–230. Springer, 2010.
  • [4] Douglas Critchlow, Michael Fligner, and Joseph Verducci. Probability models on rankings. Journal of Mathematical Psychology, 35:294–318, 1991.
  • [5] John Marden. Analyzing and Modeling Rank Data. Chapman and Hall, 1995.