## I Introduction

Extensive studies have been carried out in social science to measure group cohesion, in order to get insight into the factors affecting group cohesion, and further to promote higher group consistency (see, e.g., [1, 2, 3, 4]

). In artificial intelligence, rankings have been widely used to represent the preferences of agents (humans or systems) over a set of candidates in many information systems, such as group decision making

[5, 6, 7] and information retrieval [8, 9, 10]. Given that there is no ground truth about the actual ranking of candidates for many problems, it is important to evaluate the degree to which the rankings obtained by different agents agree, as it would help to understand the obtained rankings and decide on the general preferences. However, to the best of our knowledge, there are only a few existing studies on the evaluation of the overall consensus degree for a set of rankings. Quantifying the consensus of the obtained rankings is to provide an accurate measure about the overall agreement. It is also a quantitative indicator for comparing consensus between groups (e.g., two sets of rankings) [11] or for further improving the ranking systems. For example, in group decision making, if the consensus score is extremely low, it is necessary for experts to adjust their rankings in order to reach an agreement.Literature work uses correlation or distance functions to measure the correlation or disagreement of two rankings. Two commonly used rank correlation functions are the Kendall’s [12] and the Spearman’s [13]. The Kendall’s measures the correlation of two rankings by considering their concordant and discordant pairs. The Spearman’s evaluates the rank correlation by taking into account the positions of the items in two rankings. One typical ranking distance metric is the Kemeny distance [14]. It measures the pairs of disagreed preferences in two rankings. Although extensive research has addressed the pairwise comparison of two rankings, studies on the evaluation of the overall consensus degree of a ranking set are far from sufficiency. In the literature, cohesiveness and consensus are used interchangeably to represent the similarity of preferences in a group. The most common existing approaches to measuring the similarity of preferences in a set of rankings need to calculate the similarity for each pair of rankings based on correlation functions and then aggregate the obtained results [15]. Diversity and cohesiveness are considered as two opposite concepts of rankings in social choice theory [16]. Research was carried out to measure the diversity of a ranking set based on distances functions (see, [17]) .

This paper studies the consensus degree of a ranking set from a different perspective in order to provide a full picture on the degree to which a set of rankings mutually agree. This work propose a novel framework to analyse the consensus of rankings by considering the common patterns embedded in a ranking set. A new concept of -support patterns is introduced to represent how common patterns are embedded in rankings, by which the preferences of a group over candidates can be expressed at a subtle and fine-grained level. A pattern is regarded as a -support pattern if it is included by at least rankings in the obtained ranking set. Thus, a -support pattern represents the partial coverage of patterns by rankings, where the integer can be specified as needed when a ranking system is evaluated. The consensus of rankings is quantified based on the number of -support patterns. Compared with the existing work based on correlation or distance functions, this new approach gives a subtler characterization and quantification of the commonalities embedded in the rankings.

The contribution of this paper includes: (1) a new representation of common commonality within a set of rankings – -support pattern is proposed; (2) a new framework (non-distance or non-correlation) for quantifying consensus with -support patterns is introduced; (3) an efficient algorithm is developed to calculate consensus scores and characterize the set of -support patterns; (4) consensus scores are defined for each ranking to reflect the relationship of an individual ranking to the other rankings, which can be used to detect outliers in a ranking set; (5) extensive experiments have been conducted to show the effectiveness and usefulness of the proposed approach.

The rest of the paper is organized as follows. In Section II, related work on the pairwise comparison of rankings and the measure of consensus and diversity of rankings is reviewed. In Section III, the -support pattern of rankings is formulated and consensus scores are defined based on it. An algorithm is then introduced to quantify ranking consensus. In Section IV, an outlier detection method is developed. In Section V, weighted consensus scores are defined. Section VI gives experimental studies to verify the proposed approach. Section VII concludes this paper.

## Ii Related work

Rank correlation and distance functions. Historically developed by Maurice Kendall in 1938 [12], Kendall’s measures the correlation between two rankings by considering the numbers of pairwise items ranked in the same orders and in opposite orders. Suppose that we consider rankings over candidates . A ranking is an ordered list in which items in topper positions are more preferred than items in lower positions. Let be the position function. returns the position of item in ranking . The Kendall’s for two rankings and is

This coefficient is in the range , where value 1 correspons to the case that the two rankings are in the same order and value indicates that one ranking is in the reverse order of the other.

Spearman’s proposed by Charles Spearman in 1904 [13] is defined based on the position of each item in two rankings as follows

where and . Similarly, this coefficient satisfies .

These rank correlation functions do not take into account the varying relevance of ranked items in different positions. They are not suitable for evaluating the rankings where items at the top of a ranking are much more important than those at the bottom [18]. Further studies on weighted rank correlation were carried out extensively based on these two functions [19, 20, 21, 22, 23, 24, 25]. More reasonable variants of rank correlation functions were also proposed in the literature [26, 27, 28, 29].

Distance metrics have been used to analyze ranking data. One of the most widely used distance functions to measure rankings is the Kemeny distance [14]. It is defined as the sum of pairs where the ranking preferences disagree. One can refer to [30, 31, 32] for more information about the commonly used distance metrics.

Measuring consensus and diversity of rankings. Literature studies measure the consensus or diversity by making pairwise comparisons of the rankings and aggregating the comparison results. Thus, two key issues with these approaches include the utilization of proper comparison metrics and aggregation methods. A consensus measure was first proposed in [11] with simple axioms including unanimity, anonymity and neutrality. Work [17] improved the study of [11] by considering weighted Kemeny distance. Extended work with more reasonable distance metrics was carried out [33, 34, 35, 36]. In [16], a generalization of work [15]

was developed with a geometric mean aggregator and the leximax comparison.

## Iii Quantifying consensus with -support patterns

This section first defines the -support patterns and consensus scores of a ranking set. Then, an algorithm is presented to calculate the consensus scores by utilizing matrices to represent the -support patterns.

### Iii-a -support patterns

Let be a set of candidates to be ranked. A ranking is an ordered list in which item is more preferred than item for . Given two items and , if there exists such that and , we write ; otherwise . Specially, if , simply means that item is included in ranking , also written as .

In reality, it is often the case that most of the rankings obtained for a task share certain commonality. Suppose that we have a set of rankings . We can see that item and the pairwise item are common patterns for most of the rankings, but not for all the rankings in (e.g., , but ). These patterns, partially included in a set of rankings, show the extend to which the rankings agree. Therefore, it is necessary to consider these patterns to understand the consensus level in a set of rankings. As such, we define the following -support patterns for a ranking set.

###### Definition 1 (-support patterns).

Consider a set of rankings over candidate set . For and , we have the following subset

(1) |

Let be an integer. The pattern is a -support of , denoted by , if the size of satisfies ; otherwise . If , indicates that item is a single -support item of , also written as .

The notation means that occurs in at least rankings in . We use to denote the set of all the -support patterns, i.e.,

(2) |

### Iii-B Consensus scores

The -support patterns describe how common patterns are embedded in rankings. In this section, we first define individual consensus scores for a ranking based on the -support patterns. Then, we introduce the overall consensus scores for the ranking set . From the individual consensus scores, we can learn the relative consensus degree that a ranking shares with the others. In Section IV, we will show that this enables us to detect an outlier from a ranking set.

The following individual consensus scores are defined for a ranking .

###### Definition 2 (Individual consensus scores).

For an arbitrary ranking , the set of -support patterns with respect to is defined as

(3) |

The individual consensus scores of are

(4) | |||||

(5) |

where

(6) |

denotes the ratio of the rankings in containing the pattern of , and and respectively represent the average number of the ranked items and the pairwise patterns of .

###### Definition 3 (Overall consensus scores).

The individual consensus scores measure the consensus degree of a ranking to the others, where shows the consensus in terms of single -support items and is for pairwise -support patterns. The overall consensus scores will be used to evaluate the consensus degree of a whole ranking set. They have the following property.

Property 1. The overall consensus scores satisfy

(9) | |||

(10) |

if and only if all the rankings have no single -support item. only when all the rankings includes the same items. if and only if all the rankings have no pairwise -support patterns. only when all the rankings are the same.

###### Proof.

It is clear from Eq. (6) that , where if and only if is not a -support pattern of . only when is a full-support pattern (i.e., -support pattern) of . From Eq. (4) and Eq. (5), we have

(11) | |||

(12) |

if and only if has no -support items. only when all the items in are full-support items. Similarly, if and only if has no pairwise -support patterns. only when all the pairwise patterns of are full-support items.

Therefore, the overall consensus score satisfies . Since , we have , where if and only if all the rankings have no -support items, and only when the single items of all the rankings are full-support items, which implies that all the rankings includes the same items. Similarly, the consensus score satisfies . We can further obtain , where if and only if all the rankings have no pairwise -support patterns, and if and only if the pairwise patterns of all the rankings are full-support patterns, which implies that all the rankings are the same. ∎

### Iii-C An efficient algorithm to quantify consensus

In this section, a matrix representation is introduced to represent the -support patterns, shown in Theorem 1, which implies an algorithm for calculating the consensus scores.

###### Theorem 1.

Consider a set of rankings over candidates . For a ranking and , with the position function

(13) |

and the Heaviside function

(14) |

we define

(15) |

and matrix as

(16) |

Then, we have

(17) | |||||

(18) |

where is an

-row vector of all ones.

###### Proof.

By Eq. (13), we know that gives the position of item in . From the definition of , we can see that counts the number of rankings satisfying . Thus, the entry represents the ratio of the size of the -support subset to the size of the ranking set . Moreover, note that gives the sum of the all entries in matrix . Therefore, we can further get the result of (17) and (18) based on Definition 2. ∎

The matrix provides a proper representation of the -support patterns in . This representation can further facilitate the analysis of the commonality that individual rankings share with the others. Based on Theorem 1, we develop Algorithm 1 to calculate the consensus scores and characterize the -support patterns more efficiently.

*to*do

*to*do

*to*do

*or such that*then

*then*

*to*do

Suppose , which means that we consider as a common pattern if it is contained by at least two third of the rankings. For , if is not included by one of the first rankings, we do not need to calculate and it must be zero. If is a -support pattern, it must be included by one of the rankings . Thus, we do not need to calculate by always checking all the rankings. Line 7 in Algorithm 1 checks if of is included by a ranking for which matrix has already been constructed. If the number of rankings whose corresponding matrix is not constructed is greater than , we look for in the considered rankings ; otherwise we only check if there is an in the first rankings. As shown in Lines 8 and 9, if has been considered in a constructed matrix, it is not necessary to recalculate the corresponding entry of the current matrix and the entry is equal to that of corresponding to the pattern. Otherwise, as in Line 10, only when the number of the rankings is no less than , has the possibility to be a -support pattern. In this way, the computation cost can be significantly reduced. From Lines 11 to 17, accumulates the number of rankings containing . To further improve the computation efficiency, the sum of and the number of the remaining rankings is checked during the accumulation process. If it is less than , then has no chance to be a -support pattern and is set to be zero which will lead to be zero.

The following example shows how the matrix representation can be used to evaluate the ranking consensus.

###### Example 1.

Consider a set of rankings over candidates , and let . We have

By Eq. (17) and Eq. (18), we can obtain the following result

0.92 | 0.92 | 0.67 | 0.92 | |

0.55 | 0.55 | 0.30 | 0.60 |

The overall consensus scores are

Since represents the ratio of the rankings containing -support patterns , we can know . Furthermore, we have .

## Iv Detecting outliers

Detecting outliers is of great importance in many scenarios. One obvious application of the obtained detection result is to improve rank aggregation. Rank aggregation is the task of aggregating the preferences of different agents to generate a final ranking. The outliers of rankings/agents play a negative role in drawing a consensus ranking. Even though many existing studies have been carried out on rank aggregation [37, 38, 39], there is still room to improve aggregated rankings so that the aggregated result is as close to the ground truth as possible. Literature work on the consensus evaluation did not give a solution to the detection of outliers from a ranking set.

The individual consensus scores and directly depend on the -support patterns that shares with the other rankings in . For instance, in Example 1 shares less -support patterns with the others, thus it has much lower consensus scores. The following outlier detection method is developed from the consensus quantifying approach.

Consider a ranking set with overall consensus scores and for a given . Define the relative deviations of the individual consensus scores of ranking from the overall consensus scores as

(19) | |||||

(20) |

Note that and imply that the ranking has higher consensus scores than the average. For given constants and , if or , we regards as an outlier of the ranking set. The values of depend on the specific need for a system.

## V Quantifying consensus with consideration of positions and position gaps

The rank positions of an item and the position gaps of pairwise items may be significantly different in a ranking set. Consider the items and in Example 1. The rank positions of item are and the position gaps of the two items are . These differences definitely influence the ranking consensus. However, the consensus scores defined in the previous section only involve the existence of -support patterns. To reflect the importance of those position and gap information, the following definition presents an extension to Eq. (4) and (5), for quantifying the consensus of a ranking set more effectively.

###### Definition 4 (Weighted individual consensus scores).

The weighted consensus scores of ranking are

(21) | |||||

(22) |

where the constants and are the weights, is the average position deviation of from its average position in the ranking set, and is the average deviation of the position gaps between and .

We can calculate and as follows. For ranking , we have the set of the -support patterns defined as Eq. (3), the function in the form of Eq. (15), and the subset of containing pattern as Eq. (1). We define the average position of item in the ranking set as

(23) |

and

(24) |

The position gap between and in ranking is

(25) |

We also define the average position gap of and in the ranking set as

(26) |

and

From the definition, it can be known that smaller values of and reflect greater impact of the deviations of item positions and position gaps in rankings to the consensus scores. Note that the consensus scores defined in the previous section are a special case of the weighted consensus scores with . Here, we do not need to make any change to the overall consensus scores defined in Definition 3.

To calculate the weighted consensus scores with the matrix representation, a small change will be needed in Algorithm 1. We follow the steps of Algorithm 1 and change the way to calculate in Line 19 of Algorithm 1 to the following form

###### Remark 1 (Rankings with ties).

Rankings with ties are used in the case that the preferences over some items are identical. Let

Comments

There are no comments yet.