Unsupervised Ranking of Multi-Attribute Objects Based on Principal Curves

02/19/2014 ∙ by Chun-Guo Li, et al. ∙ 0

Unsupervised ranking faces one critical challenge in evaluation applications, that is, no ground truth is available. When PageRank and its variants show a good solution in related subjects, they are applicable only for ranking from link-structure data. In this work, we focus on unsupervised ranking from multi-attribute data which is also common in evaluation tasks. To overcome the challenge, we propose five essential meta-rules for the design and assessment of unsupervised ranking approaches: scale and translation invariance, strict monotonicity, linear/nonlinear capacities, smoothness, and explicitness of parameter size. These meta-rules are regarded as high level knowledge for unsupervised ranking tasks. Inspired by the works in [8] and [14], we propose a ranking principal curve (RPC) model, which learns a one-dimensional manifold function to perform unsupervised ranking tasks on multi-attribute observations. Furthermore, the RPC is modeled to be a cubic Bézier curve with control points restricted in the interior of a hypercube, thereby complying with all the five meta-rules to infer a reasonable ranking list. With control points as the model parameters, one is able to understand the learned manifold and to interpret the ranking list semantically. Numerical experiments of the presented RPC model are conducted on two open datasets of different ranking applications. In comparison with the state-of-the-art approaches, the new model is able to show more reasonable ranking lists.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Fig. 1: Hierarchical diagram of ranking approaches. RPC is an unsupervised ranking approach based on multi-attribute observations for objects.

From the viewpoint of machine learning, ranking can be performed in an either supervised or unsupervised way as shown in the hierarchical structure in Fig.

1. When supervised ranking [1] is able to evaluate the ranking performance from the given ground truth, unsupervised ranking seems more challenging because no ground truth label is available. Modelers or users will encounter a more difficult issue below:

How can we insure that the ranking list from the unsupervised ranking is reasonable or proper?

From the viewpoint of given data types, ranking approaches can be further divided into two categories: ranking based on link structure and ranking based on multi-attribute data. PageRank [2] is one of the representative unsupervised approaches to rank items which have a linking network (e.g. websites). But PageRank and its variants do not work for ranking candidates which have no links. In this paper, we focus on unsupervised ranking approaches on a set of objects with multi-attribute numerical observations.

To rank from multi-attribute objects, weighted summation of attributes is widely used to provide a scalar score for each object. But different weight assignments give different ranking lists such that ranking results are not convincing enough. The first principal component analysis (PCA) provides a weight learning approach

[5], by which the score for each object is determined by its principal component on the skeleton of the data distribution. However, it encounters problems when the data distribution is nonlinearly shaped. Although kernel PCA [5] is proposed to attack this problem, the mapping to the kernel space is not order-preserving, which is the basic requirement for a ranking function. Neither dimension reduction methods [6]

nor vector quantization

[9] can assign scores for multi-attribute observations.

As the nonlinear extension of the first PCA, principal curves can be used to perform a ranking task [10, 8]. A principal curve provides an ordering of data points by the ordering of threading through their projected points on the curve (illustrated by Fig. 2) which can be regarded as the “ranking skeleton”. However, not all of principal curve models are capable of performing a ranking task. Polyline approximation of a principal curve [11] fails to provide a consistent ranking rule due to non-smoothness at connecting points. Besides, it fails to guarantee order-preserving. Order-preserving can not be guaranteed either by a general principal curve model (e.g. [19]) which is not modeled specially for ranking tasks. The problem can be tackled by the constraint of strict monotonicity which is one of the constraints we present for ranking functions in this paper. Example 1 shows that strict monotonicity is a necessary condition for a ranking function but was neglected by all other investigations.

Example 1.

Suppose we want to evaluate life qualities of countries with a principal curve based on two attributes: LEB111Life Expectancy at Birth, years and GDP222Gross Domestic Product per capita by Purchasing Power Parities, K$/person. Each country is a data point in the two-dimensional plane of LEB and GDP. If the principal curve is approximated by a polyline as in Fig. 2(a), the piece of the horizontal line is not strictly monotone. It makes the same ranking solution for and but should be ranked higher than . For a general principal curve like the curve in Fig. 2(b) which is not monotone, two pairs of points are ordered unreasonably. The pair, and , are put in the same place of the ranking list since they are projected to the same point which has the vertical tangent line to the curve. But should be ranked higher for its higher LEB than . Another pair, and , are also put in the same place but apparently should be ranked higher than . With strict monotonicity, these points would be in the order that they are.

(a) Polyline Approximation (non-strict monotonicity)
(b) A General Principal Curve (non-monotonicity)
Fig. 2: Examples on a monotonicity property for ranking with principal curves.

Following the principle of “let the data speak for themselves[12], this work tries to attack problems for unsupervised ranking of multi-attribute objects with principal curves . First, ranking performance is taken into account for the design of ranking functions. It is known that knowledge of a given task can always improve learning performance [13]. The reason why PageRank produces a commonly acceptable search result for a query, lies on that PageRank algorithm is designed by integrating the knowledge about backlinks [2]. For multi-attribute objects with no linking networks, knowledge about ranking functions can be taken into account to make ranking functions produce reasonable ranking lists. In this work, we present five essential meta-rules for ranking rules (Fig. 3). These meta-rules can be capable of assessing the reasonability of ranking lists for unsupervised ranking.

Second, principal curves should be modeled to be able to serve as ranking functions. As referred in [8], ranking with a principal curve is performed on the learned skeleton of data distribution. But not all principal curve models are capable of producing reasonable ranked lists when no ranking knowledge is embedded into principal curve models. Motivated by [14], the principal curve can be parametrically designed with a cubic Bézier curve. We will show in Section 4 that the parameterized principal curve has all the five meta-rules with constraints on control points and that its existence and convergency of learning algorithm are proved theoretically. Therefore, the parameterized principal curve is capable of making a reasonable ranking list.

Fig. 3: Motivation of RPC model for unsupervised ranking.

The following points highlight the main contributions of this paper:

  • We propose five meta-rules for unsupervised ranking, which serve as high-level guidance in the design and assessment of unsupervised ranking approaches for multi-attribute objects. We justify that the five meta-rules are essential in applications, but unfortunately some or all of them were overlooked by most of ranking approaches.

  • A ranking principal curve (RPC) model is presented for unsupervised ranking from multi-attribute numerical observations of objects, different from PageRank which ranks from link structure [2]. The presented model can satisfy all of five meta-rules for ranking tasks, while other existing approaches [8] overlooked them.

  • We develop the RPC learning algorithm, and theoretically prove the existence of a RPC and convergency of learning algorithm for given multi-attribute objects for ranking. With RPC learning algorithm, reasonable ranking lists for openly accessible data illustrate the good performance of the proposed unsupervised ranking approaches.

1.1 Related Works

Domain knowledge can be integrated into leaning models to improve learning performance. By coupling domain knowledge as prior information with network constructions, Hu et al. [13] and Daniels et al. [15]

improve the prediction accuracy of neural networks. Recently, monotonicity is taken into consideration as constraints by Kotłowski et al.

[16] to improved the ordinal classification performance. For unsupervised ranking, the domain knowledge of monotonicity can also be taken into account and is capable of assessing the ranking performance, other than evaluation of side-effects [17].

Ranking on manifolds has provided a new ranking framework [3, 8, 4, 18], which is different from general ranking functions such as ranking aggregation [7]. As one-dimensional manifolds, principal curves are able to perform unsupervised ranking tasks from multi-attribute numerical observations of objects [8]. But not all principal curve models can serve as ranking functions. For example, Elmap can well portray the contour of a molecular surface [19] but would bring about a biased ranking list due to no guarantee of order-preserving [8]. What’s more, Elmap is hardly interpretable since the parameter size of principal curves is unknown explicitly.

A Bézier curve is a parametrical one-dimensional curve which is widely used in fitting [20]. Hu et al. [14] proved that in two-dimensional space a cubic Bézier curve is strictly monotone with end points in the opposite corner and control points in the interior of the square box as shown in Fig. 4. To avoid confusion, end points refer to the points on both ends of the control polyline (also the end points of the curve) and control points refer to the other vertices of the control polyline in this paper.

Fig. 4: For an increasing monotone function, there are four basic nonlinear shapes [14] of cubic Bézier curves (in blue) which mimic shapes of the control polylines (in red). Curve shapes are determined by the locations of control points.

1.2 Paper Organization

The rest of this paper is organized as follows. Backgrounds of this paper are formalized in the next section. In Section 3, five meta-rules are elaborated for ranking functions. In Section 4, a ranking model, namely ranking principal curve (RPC) model, is defined and formulated with a cubic Bèzier curve which is proved to follow all the five meta-rules for ranking functions. RPC learning algorithm is designed to learn the control points of the cubic Bèzier curve in Section 5. To illustrate the effective performance of the proposed RPC model, applications on real world datasets are carried out in Section 6, prior to summary of this paper in Section 7.

2 Backgrounds

Consider ranking a set of objects according to real-valued attributes (or indicators, features) . Numerical observations of one object on all the attributes comprise an item which is denoted as a vector in -dimensional space . Ranking objects in is equivalent to ranking data points . That is, to give the ordering of can be achieved by discovering the ordering of where is a permutation of and means that precedes . As there is no label to help with ranking, it is an unsupervised ranking problem from multi-attribute data.

Mathematically, ranking task is to provide a list of totally ordered points. A total order is a special partial order which requires comparability in addition to the requirements of reflexivity, antisymmetry and transitivity for the partial order [21]. Let and are one pair of points in . For ranking, if and are different, they have the ordinal relation of either or . If and , then which infers that and are the same thing.

Remembering that a partial order is associated with a proper cone and that is a self-dual proper cone [21] , the order for ranking tasks on is defined in this paper to be

(1)

where , , and

(2)

It is easy to verify that the order defined by Eq.(1) is a total order with properties of comparability, reflexivity, antisymmetry and transitivity. In Eq.(2), and are two subsets of such that and . If let

(3)

is unique for one given ranking task and varies from task to task. For a given ranking task with defined , precedes for and .

As is totally ordered, we prefer to grade each point with a real value to help with ranking. Assume is the ranking function to assign a score which provides the ordering of . is required to be order-preserving so that has the same ordering in as in . In order theory, an order-preserving function is also called isotone or monotone [22].

Definition 1 ([22]).

A function is called monotone (or, alternatively, order-preserving) if

(4)

and strictly monotone if

(5)

Order-preserving is the basic requirement for a ranking function. For a partially ordered set, should assign a score to no more than the score to if . Moreover, if also holds, the score assigned to must be smaller than the score to . As is totally ordered and different points should be assigned with different scores, the ranking function is required to be strictly monotone as stated by Eq.(5). Otherwise, the ranking rule would be meaningless due to breaking the ordering in original data space .

Example 2.

In addition to the two indicators in Example 1, another two indicators are taken to evaluate life qualities of countries: IMR333Infant Mortality Rate per 1000 born and Tuberculosis444new cases of infectious Tuberculosis per 100,000 of population. It is easily known that the life quality of one country would be higher if it has a higher LEB and GDP while a lower IMR and Tuberculosis. Let numerical observations on four countries to be , , , and respectively. By Eq.(1), they have the ordering with . In this case, and . Let , , and . Then is a strictly monotone mapping which strictly preserves the ordering in .

Recall that a differentiable function is nondecreasing if and only if for all , and increasing if for all (but the converse is not true) [23]. They are readily extended to the case of monotonicity in Definition 1 with respect to the order defined by Eq.(1).

Theorem 1 ([21]).

Let be differentiable. is monotone if and only if

(6)

where is the zero vector. is strictly monotone if

(7)

Theorem 1 provides first-order conditions for monotonicity. Note that ‘’ denotes a strict partial order [21]. Let

(8)

infers for and for . infers that each component of does not equal to zero. By the case of strict monotonicity in Theorem 1, infers not only that is strictly monotone from to , but also that the value is increasing with respect to and decreasing with respect to . Vice versa, if is bigger than zero for and smaller than zero for , holds and infers is a strictly monotone mapping. Lemma 1 can be concluded immediately.

Lemma 1.

is strictly monotone if and only if is strictly monotone along with fixed the others .

Further more, a strictly monotone mapping infers a one-to-one mapping that for a value there is exactly one point such that . If the point is denoted by , is called the inverse mapping of and inherits the property of strict monotonicity of its origin .

Theorem 2.

Assume . There exists an inverse mapping denoted by such that holds for all , that is for

(9)

Proof of Theorem 2 can be found in Appendix B. The theorem also holds in the other direction. Assuming , if , there exists an inverse mapping and holds for all . Because of the one-to-one correspondence, and share the same geometric properties such as scale and translation invariance, smoothness and strict monotonicity [23].

3 Meta-Rules

As a ranking function for , outputs a real value as the ranking score for a given point . The ranking list of objects would be provided by sorting their ranking scores in ascending/descending order. Since unsupervised ranking has no label information to verify the ranking list, we restrict ranking functions with five essential features to guarantee that a reasonable ranking list is provided. These features are capable of serving as high-level guidance of modeling ranking functions. They are also capable of serving as high-level assessments for unsupervised ranking performance, different from assessments for supervised ranking performance which take qualities of ranking labels. Any functions from to with all the five features can serve as ranking functions and be able to provide a reasonable ranking list. These features are rules for ranking rules, namely meta-rules.

3.1 Scale and Translation Invariance

Definition 2 ([24]).

A ranking rule is invariant to scale and translation if for

(10)

where performs scale and translation.

Numerical observations on different indicators are taken on different dimensions of quantity. In Example 1, GDP is measured in thousands of dollars while LEB ranges from 40 to 90 years. They are not in the same dimensions of quantity. As a general data preprocessing technique, scale and translation can take them into the same dimensions (e.g. ) while preserving their original ordering. If let

be a linear transformation on

, we have for [24]. Therefore, a ranking function should produce the same ranking list before and after scaling and translating.

3.2 Strict Monotonicity

Definition 3 ([22]).

is strictly monotone if for and .

Strict monotonicity in Definition 1

is specified here as one of meta-rules for ranking. For ordinal classification problem, monotonicity is a general constraint since two different objects would be classified into the same class

[16]. But for the ranking problem discussed in this paper, it requires the strict monotonicity since different objects should have different scores for ranking. holds if and only if . In Example 1, and indicate that a higher score should be assigned to than . And so do and . Therefore, the ranking function is required to be a strictly monotone mapping. Otherwise, the ranking list would be not convincing. in Example 2 is to the point referred here.

3.3 Linear/Nonlinear Capacities

Definition 4.

has the capacities of linearity and nonlinearity if is able to depict the relationship of both linearity and nonlinearity.

Taking the ranking task in Example 1 for illustration, one has no knowledge about the relationship between LEB and the score. The score might be a either linear or nonlinear function of LEB. It is the similar case for the relationship between GDP and the score. Therefore, should embody both of the linear and nonlinear relationships between and . For the ranking task in Example 1, the ranking function should be a linear function of LEB for fixed GDP if LEB is linear with . Meanwhile, should also be a nonlinear function of GDP for fixed LEB if GDP is nonlinear with .

3.4 Smoothness

Definition 5 ([23]).

is smooth if is .

In mathematical analysis, a function is called smooth if it has derivatives of all orders [23]. Yet a ranking function is required to be of class where . That is, is continuous and has the first-order derivative . The first-order derivative guarantees that will exert a consistent ranking rule for all objects and the ranking rule would be not abruptly changed for some object. Taking the polyline in Fig. 2 for illustration, it is of class but not of class because it is continuous but not differentiable at the connecting vertex of the two lines. This would lead to an unreasonable ranking for those points projected to the vertex.

3.5 Explicitness of Parameter Size

Definition 6.

has the property of explicitness if has known parameter size for a fair comparison among ranking models.

Hu et al. [13] considered that nonparametric approaches are a class of “black-box” approaches since they can not be interpreted by our intuition. As a ranking function, should be semantically interpretable so that has systematical meanings. For example, gives explicitly the linear expression with parameter size which is the dimension of the parameter . It can be interpreted that the score of is linear with and the parameter is the allocation proportion vector of indicators for ranking. Moreover, if there is another ranking model with the same characteristics, would be more applicable if it has a smaller size of parameters.

These five meta-rules above is the guidance of designing a reasonable and practical ranking function. To perform a ranking task, a ranking function should satisfy all the five meta-rules above to produce a convincing ranking list. Any ranking function that breaks any of them would produce a biased and unreasonable ranking list. In this sense, they can be regarded as high-level assessments for unsupervised ranking performance.

4 Ranking Principal Curves

In this section, we propose a ranking principal curve (RPC) model to perform an unsupervised ranking task with a principal curve which has all the five meta-rules. The RPC is parametrically designed to a cubic Bézier curve with control points restricted in the interior of a hypercube.

4.1 RPC Model

Fig. 5: Schematic plots of ranking skeletons (heavy solid lines or curves in red). Circle points: observations of countries on two indicators: LEB and GDP.

The simplest ranking rule is the first PCA which summarizes the data in -dimensional space with the largest principal component line [25]. The first PCA seeks the direction

that explains the maximal variance of the data cloud. Then

is orthogonally projected by onto the line passing through the mean . The line can be regarded as the ranking skeleton. Projected points take an ordering along the ranking skeleton which is just the ordering of their first principal components computed by . Let and an ordering of gives the ordering of . As a ranking function, the first PCA is smooth, explicitly expressed, and invariant to scale and translation. It works well for the skeleton of slender ellipse distributing data. However, the first PCA can hardly depict the skeleton of data distributions like crescents (Fig. 5(a)) such that the produced ranking list is not convincing. What’s more, the first PCA might be non-strictly monotone when the direction is parallel to one coordinate axis such that it can not discriminate those points like and in Example 1 since they will be projected to the same points if the first PCA is on the direction parallel to the horizontal line. The problems referred above hinder the first PCA from extensive applications in comprehensive evaluation.

Recalling that principal curves are nonlinear extensions of the first PCA [10], we try to summarize multiple indicators in the data space with a principal curve (Appendix A gives a brief review of principal curves). Assuming is the principal curve of a given data cloud, it provides an ordering of projected points on the principal curve, in a way similar to the first PCA. Intuitively, the principal curve is a good choice to perform ranking tasks. On the one hand, unsupervised ranking could only rely on those numerical observations for ranking candidates on given attributes. For the dataset with a linking network, PageRank can calculate a score with backlinks for each point [2]. When there is no link between points, a score can still be calculated according to the ranking skeleton, instead of link structure. On the other hand, the principal curve reconstructs according to , instead of for the first PCA. To perform ranking tasks, a ranking function assigns a score to by . Actually, noise is inevitable due to measuring errors and influence from exclusive indicators from . Thus the latent score should be produce after removing noise from , that is . As a ranking function, is assumed to be strictly monotone. Thus, data points and scores are one-to-one correspondence and there exists an inverse function for such that

(11)

which is the very principal curve model[10]. The inverse function can be taken as the generating function for numerical observations from the score which can be regarded to be pre-existing.

As stated in Section 3, there are five meta-rules for a function to serve as a ranking rule. As is required to be strictly monotone, there exists an inverse function which is also strictly monotone by Theorem 2. Correspondingly, and its inverse share the other properties of scale and translation invariance, smoothness, capacities of linearity and nonlinearity, and explicitness of parameter size. A principal curve should also follow all the five meta-rules to serve as a ranking function. However, polyline approximations of the principal curve might go against smoothness and strict monotonicity (e.g. Fig. 5(b)). A smooth principal curve would also go against strict monotonicity (e.g. Fig. 5(c)). Both of them would make unreasonable ranking solutions as illustrated in Example 1. Within the framework of Fig. 3, all the five meta-rules can be modeled as constraints to the ranking function. Since a principal curve is defined to be smooth and invariant to scale and translation [10], the constraint of strict monotonicity would make it be capable of performing ranking tasks (e.g. Fig. 5(d)). Naturally, the principal curve should have a known parameter size for interpretability reason. We present Definition 7 for unsupervised ranking with a principal curve.

Definition 7.

A curve in -dimensional space is called a ranking principal curve (RPC) if is a strictly monotone principal curve of given data cloud and it is explicitly expressed with known parameters of limited size.

4.2 RPC Formulation with Bézier Curves

To perform a ranking task, a principal curve model should follow all the five meta-rules (Section 3) which can be also similarly defined for . However, not all of principal curve models can perform ranking tasks. The models in [10, 26, 27, 29] lack of explicitness and can not make a monotone mapping on (Fig. 5(c)). Polyline approximation [11, 28, 19] misses the requirements for smoothness and strictly monotonicity (Fig. 5(b)). A new principal curve model is needed to perform ranking while following all the five meta-rules.

In this paper, an RPC is parametrically modeled with a Bézier curve

(12)

which is formulated in terms of Bernstein polynomials [31]

(13)
(14)

In Eq.(12), are control and end points of the Bézier curve which are in the place of the function parameters in Eq.(11). Particularly, when , Eq.(12) has the matrix form of

(15)

where

In case , the model would become more complex and bring about overfitting problem. In case , the model is too simple to represent all possible monotonic curves. is the most suitable degree to perform the ranking task.

A cubic Bézier curve with constraints on control points can be proved to have all the five meta-rules. First of all, the formulation Eq.(12

) is a nonlinear interpolation of control points and end points in terms of Bernstein polynomials

[31]. These points are the determinant parameters of total size . Different locations of these points would produce different shapes of nonlinear curves besides straight lines [14]. Scale and translation to Bézier curves are applied to these points without changing the ranking score which is contained in

(16)

where is a diagonal matrix with scaling factors to dimensions and is the translation vector. This property allows us to put all data into in order to facilitate ranking. What’s more, the derivative of is a lower order Bézier curve

(17)

which involves the calculation of end points and control points. Its derivatives of all orders exist for all and thus Eq.(12) is smooth enough. Last but not the least, it has been proved that a cubic Bézier curve can perform the four basic types of strict monotonicity in two-dimensional space [14]. Let end points after scale and translation are denoted by and . Control points and are the determinants for nonlinearity of the cubic Bézier curve (Fig. 4). In two-dimensional space, is proved to be increasing along each coordinate if control points are restricted in the interior of the hypercube [14]. Thus, a proposition can be deduced by Lemma 1.

Proposition 1.

is strictly monotone for with , and .

What is the most important, there always exists an RPC parameterized by a cubic Bézier curve which is strictly monotone for a group of numerical observations. The existence has failed to be proved in many principal curve models [10, 28, 19].

Theorem 3.

Assume that is the numerical observation of a ranking candidate and that . There exists such that is strictly monotone and

(18)

Proof of Theorem 3 can be found in Appendix C.

5 RPC Learning Algorithm

To perform unsupervised ranking from the numerical observations of ranking candidates , we should first learn control points of the curve in Eq.(12

). The optimal points achieve the infimum of the estimation of

in Eq.(18). By the principal curve definition proposed by Hastie et al.[10], the RPC is the curve which minimizes the summed residual . Therefore, the ranking task is formulated as a nonlinear optimization problem

(19)

where Eq.(5) determines to find the point on the curve which has the minimum residual to reconstruct by . Obviously, a local minimizer can be achieved in an alternating minimization way

(21)
(22)

where means the th iteration.

The optimal solution of Eq.(21) has an explicit expression. Associate with

(23)

and Eq.(19) can be rewritten in matrix form

(24)

Setting the derivative of with respect to to zero

(25)

and remembering [35], we get an explicitly expression for the minimum point of Eq.(19)

(26)

where takes pseudo-inverse computation. Based on the th iterative results , the optimal solution can be given by substituting into Eq.(26) which is . However, is computationally expensive in numerical experiments and is always ill-conditioned which has a high condition number, resulting in that a very small change in would produce a tremendous change in . is not the optimal solution of Eq.(19) but a intermediate result of the iteration, and would thereby go far away from the optimal solution. To settle out the problem, we employ the Richardson iteration [37] with a preconditioner which is a diagonal matrix with the norm of columns of as its diagonal elements. Then is updated according to

(27)

where is a scalar parameter such that the sequence converges. In practice, we set

(28)

where and

is the minimum and maximum eigenvalues of

respectively [38].

After getting , the score vector can be calculated as the solution to Eq.(22). Eq.(22) is a quintic polynomial equation which rarely has explicitly expressed roots. In [20], for was approximated by Gradient and Gauss-Newton methods respectively. Jenkins-Traub method [32] was also considered to find the roots of the polynomial equation directly. As Eq.(5) is designed to find the minimum distance of point from the curve, we adopt Golden Section Search (GSS) [33] to find the local approximate solution to Eq.(22).

0:     : data matrix; : a small positive value;
0:     : control points of the learned Bézier curve : the score vector of objects in the set.
1:  Normalize into ;
2:  Initialize ;
3:  while  do
4:     Adopt GSS to find the approximate solution ;
5:     Compute using a preconditioner;
6:     if  then
7:        break;
8:     end if
9:  end while
Algorithm 1 Algorithm to learn an RPC.

Algorithm 1 summarizes the alternative optimization procedure. Before performing the ranking task, numerical observations of objects should be normalized into by

(29)

where is the normalized vector of , the minimum vector and the maximum vector. Grading scores would be unchanged as scaling and translating are only performed on control points and end points (Eq.(16)) without changing the interpolation values. In Step 2, we initialize the end points as and , and randomly select samples as control points. During learning procedure, is automatically learned making a Bézier curve to be an RPC in numerical experiments. In Step 6, occurs when begins to increase. In this case, the algorithm stops updating and gets a local minimum . Proposition 2 guarantees the convergency of the sequence found by RPC learning algorithm (proof can be found in Appendix D). Therefore, the RPC learning algorithm finds a converging sequence of to achieve the infimum in Eq.(18).

Proposition 2.

If as , is a decaying sequence which converges to as .

Algorithm 1 converges in limited steps. In each step, is updated in size and scores for points are calculated in size. When iteration stops, ranking scores are produced along with . In summary, the computational complexity of RPC unsupervised ranking model is . Compared to the ranking rule of weighted summation, ranking with RPC model costs a little more. However, weighted summation needs weight assignments by a domain expert such that it is more subjective because weights is diverse expert by expert. But RPC model needs no expert to assign weight proportions to indicators. The learning procedure of RPC model does the whole work for ranking.

The RPC learning algorithm learns a ranking function in a completely different way from the traditional methods. On the one hand, the ranking function is in constraints of five meta-rules for ranking rules. Integrating meta-rules with ranking functions makes the ranking rule be more in line with human knowledge about ranking problems. As a high level knowledge, these meta-rules are capable of evaluating ranking performance. On the other hand, ranking is carried out following the principle of unsupervised ranking, “let the data speak for themselves”. For unsupervised ranking, there is no information for ranking labels to guide the system to learn a ranking function. As a matter of fact, the structure of the dataset contains the ordinal information between objects. If all the determining factors of ordinal relations are included, the RPC can thread through all the objects successively. In practice, the most influential indicators are selected to estimate the order of objects, but the rest factors still affect the numerical observation. In the case we know nothing about the rest factors, we would better to minimize the effect which we formulate to be error . Therefore, minimizing errors is adopted as the learning objection in case no ranking label can be available.

6 Experiments

6.1 Comparisons with Ranking Aggregation

Fig. 6: , and are three objects to rank. , and are scores given by the RPC (in green) of S-type shape in the figure. A different observation of (denoted by ) would give a different RPC (in pink) and thus a different ordering of objects.
Object RankAgg RPC
Value Order Value Order Score Order
2 1 1.5 0.2329 1
1 2 1.5 0.3304 2
3 3 3 0.7300 3
(a) A group of bservations and ranking lists by different rules
Object RankAgg RPC
Value Order Value Order Score Order
2 1 1.5 0.3708 2
1 2 1.5 0.3431 1
3 3 3 0.7318 3
(b) Another group of bservations and ranking lists by different rules
TABLE I: RPC model can detect ordinal information contained in numerical observations in Fig. 6.

For ranking task, some researchers prefer to aggregate many different ranking lists of the same set of objects in order to get a “better” order. For example, median rank aggregation [34] aggregates different orderings into a median rank with

(30)

where is the location of object in ranking list , is a permutation of and is the ordering of median rank aggregation. However, approaches of ranking aggregation suffers the difficulties of strict monotonicity and smoothness. Therefore, the ranking list is not very convincing. What’s more, aggregation merely combines the orderings and ignores the information delivered by numerical observations.

In contrast, RPC is modeled following all the five meta-rules which infers a reasonable ranking list. Moreover, RPC can detect the ordinal information embedded in the numerical observations, illustrated in Fig. 6. Consider to rank three objects , and in a two-dimensional space in Fig. 6. Let their numerical observations on and be values shown in Table II(a). Objects can be ordered along with and respectively. Median rank aggregation [34] produces an ordering which can not distinguish and since they are in the paratactic place of the ranking list. In contrast, the RPC model produce the order where and are in a distinguishable order since RPC ranks objects based on their original observation data. If there is a different observation for one of objects, a different RPCwould produce a different ranking list while RankAgg remains the same. In Table II(b), a different observation of object is obtained, denoted as . A different RPC is learned (the pink curve in Fig. 6) and gives the order (the last column of Table II(b)) which is different from the order in Table II(a). In summary, RPC is able to capture the ordinal information contained not only among ranking candidates but also in the individual observation.

6.2 Applications

Unsupervised ranking of multi-attribute observations of objects has a widely applications. The most significant application is to rank countries, journals and universities. Taking the journal ranking task for illustration, there have been many indices to rank journals, such as impact factor (IF) [39] and Eigenfactor [40]. Different indices reflect different aspects of journals and provide different ranking lists for journals. Thus, how to evaluate journals in a comprehensive way becomes a tough problem. RPC model is proposed as a new framework to attack the problem which provides an ordering along the “ranking skeleton” of data distribution. In this paper, we perform ranking tasks with RPCs to produce a comprehensive evaluation on three open access datasets of countries and journals with the open source software Scilab (5.4.1 version) on a Ubuntu 12.04 system with 4GB memory. Due to space limitation, we just list parts of their ranking lists (the full lists will be available when the paper is published).

6.2.1 Results on Life Qualities of Countries

Country GDP1 LEB2 IMR3 Tuberculosis4 Elmap[8] RPC
Score Order Score Order
Luxembourg 70014 79.56 6 4 0.892 1 1.0000 1
Norway 47551 80.29 3 3 0.647 2 0.8720 2
Kuwait 44947 77.258 11 10 0.608 3 0.8483 3
Singapore 41479 79.627 12 2 0.578 4 0.8305 4
United States 41674 77.93 2 7 0.575 5 0.8275 5
Moldova 2362 67.923 63 17 0.002 97 0.5139 96
Vanuatu 3477 69.257 37 31 0.011 96 0.5135 97
Suriname 7234 68.425 53 30 0.011 95 0.5133 98
Morocco 3547 70.443 44 36 0.002 98 0.5106 99
Iraq 3200 68.495 25 37 -0.002 100 0.5032 100
South Africa 8477 51.803 349 55 -0.652 167 0.0786 167
Sierra Leone 790 46.365 219 160 -0.664 169 0.0541 168
Djibouti 1964 54.456 330 88 -0.655 168 0.0524 169
Zimbabwe 538 41.681 311 68 -0.680 170 0.0462 170
Swaziland 4384 44.99 422 110 -0.876 171 0 171
44713 81.218 2 0 - - - -
330 80.4 2 0 - - - -
330 59.7 33 43 - - - -
1581.824 41.68 290 151 - - - -
  • Gross Domestic Product per capita by Purchasing Power Parities, $per person;

  • Life Expectancy at Birth, years;

  • Infant Mortality Rate (per 1000 born);

  • Infectious Tuberculosis, new cases per 100,000 of population, estimated.

  • are control and end points of the RPC.

TABLE II: Part of the ranking list for life qualities of countries.
Fig. 7: Two-dimensional display of data points and RPC for life qualities of countries. Green points are numerical observations and red curves are 2-dimensional projection of RPC.

Gorban et al. [8] ranked 171 countries by life qualities of people with data driven from GAPMINDER555http://www.gapminder.org/ based on four indicators as in Example 2. For comparison, we also use the same four GAPMINDER indicators in [8]. The RPC learned by Algorithm 1 is shown in two-dimensional visualization in Fig. 7 and part of the ranking list is illustrated in Table II.

From Fig. 7, RPC portrays the data distributing trends with different shapes, including linearity and nonlinearity. For this task, for this task just as Example 2. also discovers the relationship between indicators for ranking. GDP is in the same direction with LEB, but in the opposite direction with IMR and Tuberculosis. In the beginning, a small amount of GDP increasing brings about tremendous increasing of LEB and tremendous decreasing of IMR and Tuberculosis. When GDP exceeds $14300 (0.2 as normalized value in Fig. 7) per person, increasing GDP does result in little LEB increase, so does IMR and Tuberculosis decrease. As a matter of fact, it is hard to improve further LEB, IMR and Tuberculosis when they are close to the limit of human evolution.

In Table II, control points provided by RPC learning algorithm (Algorithm 1) are listed in the bottom. in the bottom is given in the original data space. Although the number of control points are set to two in addition to two end points, the number actually needed for each indicators is adapted automatically by learning. From Table II, and for IMR and Tuberculosis overlaps which means that three points are enough for a Bézier curve to depicts the skeleton of IMR and Tuberculosis. Two-dimensional visualizations in Fig. 7 tally with the statement above.

Gorban et al. [8] provided centered scores for countries, which is similar to the first PCA. But the zero score is assigned to no country such that no country is taken as the ranking reference. In addition, rankers would get into trouble to understand the ranking principle due to unknown parameter size. Therefore, the ranking list is hard to interpret for human understanding. Compared with Elmap [8], the presented RPC model follows all the five meta-rules. With these meta-rules as constraints, it achieves a better fitting performance in term of Mean Square Error ( vs of explained variance). It produces scores in where 0 and 1 are the worst and the best reference respectively. Luxembourg with the best life quality provides a developing direction for countries below. Additionally, the RPC model is interpretable and easy to carry out in practice since there are just four points to determine the ranking list.

6.2.2 Results on Journal Ranking

Title Impact Factor (IF) 5-Year IF Immediacy Index Eigenfactor Influence Score RPC
Score Order Score Order Score Order Score Order Score Order Score Order
IEEE T PATTERN ANAL 4.795 7 6.144 5 0.625 26 0.05237 3 3.235 6 1.0000 1
ENTERP INF SYST UK 9.256 1 4.771 10 2.682 2 0.00173 230 0.907 86 0.9505 2
J STAT SOFTW 4.910 4 5.907 6 0.753 18 0.01744 20 3.314 4 0.9162 3
MIS QUART 4.659 8 7.474 2 0.705 21 0.01036 49 3.077 7 0.9105 4
ACM COMPUT SURV 3.543 21 7.854 1 0.421 56 0.00640 80 4.097 1 0.9092 5
DECIS SUPPORT SYST 2.201 51 3.037 43 0.196 169 0.00994 52 0.864 93 0.4701 65
COMPUT STAT DATA AN 1.304 156 1.449 180 0.415 61 0.02601 11 0.918 83 0.4665 66
IEEE T KNOWL DATA EN 1.892 82 2.426 72 0.217 152 0.01256 37 1.129 55 0.4616 67
MACH LEARN 1.467 133 2.143 96 0.373 70 0.00638 81 1.528 20 0.4490 68
IEEE T SYST MAN CY A 2.183 53 2.44 68 0.465 46 0.00728 69 0.767 111 0.4466 69
TABLE III: Part of the ranking list for JCR2012 journals of computer sciences.

We also apply RPC model to rank journals with data accessable from the Web of Knowledge666http://wokinfo.com/

which is affiliated to Thomson Reuters. Thomson Reuters publishes annually Journal Citation Reports (JCR) which provide information about academic journals in the sciences and social sciences. JCR2012 reports citation information with indicators of Impact Factor, 5-year Impact Factor, Immediacy Index, Eigenfactor Score, and Article Influence Score. After journals with data missing are removed from the data table (58 out of 451), RPC model tries to provide a comprehensive ranking list of journals in the categories of computer science: artificial intelligence, cybernetics, information systems, interdisciplinary applications, software engineering, theory and methods. Table

III illustrates the ranking list of journals produced by RPC model based on JCR2012. Two-dimensional visualization of the RPC is shown in Fig. 8.

For this ranking task, a journal will rank higher with a higher value for each indicator, that is . Among all the indicators here, 5-year Impact Factor shows almost a linear relationship with the others. But Eigenfactor presents no clear relationship which means that it is calculated in a very different way from the other indicator. Actually, Eigenfactor works like PageRank [2] while the others take frequency count.

From Table III, IEEE Transactions on Knowledge and Data Engineering (TKDE) is ranked in a higher place than IEEE Transactions on Systems, Man, and Cybernetics-Part A (SMCA) although SMCA has a higher IF (2.183) than TKDE (1.892). The lower influence score (0.767) of SMCA brings it down the ranking list (vs. 1.129 for TKDE). Therefore, TKDE gets a higher comprehensive evaluating score and wins a higher ranking place in the ranking list. This means that one indicator does not tell the whole story of ranking lists. RPC produces a ranking list for journals taking account several indicators of different aspects.

Fig. 8: Two-dimensional display of data points and RPC for JCR2012. Green points are numerical observations and red curves are 2-dimensional projection of RPC. (IF:Impact Factor, 5IF:5-Year IF, ImmInd:Immediacy Index, IS: Influence Score)

7 Conclusions

Ranking and its tools have and will have an increasing impact on the behavior of human, either positively or negatively. However, those ranking activities are still facing many challenges which have greatly restrained to the rational design and utilization of ranking tools. Generally, ranking in practice is an unsupervised task which encounters a critical challenge that there is no ground truth to evaluate the provided lists. PageRank [2] is an effective unsupervised ranking model for ranking candidates with a link-structure. However, it does not work for numerical observations on multiple attributes of objects.

It is well known that domain knowledge can always improve the data mining performance. We try to attack unsupervised ranking problems by domain knowledge about ranking. Motivated by [13, 16], five meta-rules as ranking knowledge are presented and are regarded as constraints to ranking models. They are scale and translation invariance, strict monotonicity, linear/nonlinear capacities, smoothness and explicitness of parameter size. They can also be capable of assessing the ranking performance of different models. Enlightened by [14, 8], we propose a ranking model with a principal curve which is parametrically formulated with a cubic Bézier curve by restricting control points in the interior of the hypercube . Control points are learned from the data distribution without human interventions. Applications in life qualities of countries and journals of computer sciences show that the proposed RPC model can produce reasonable ranking lists.

From an application view points, there are many indicators for ranking objects. RPC can also be used to do feature selection which is one part of our future works.

Appendix A Principal Curves

Given a dataset , , a principal curve summarizes the data with a smooth curve instead of a straight line in the first PCA

(A-1)

where and . The principal curve was originally defined by Hastie and Stuetzle [10] as a smooth () unit-speed () one-dimensional manifold in satisfying the self-consistence condition

where is the largest value so that has the minimum distance from . Mathematically, is formulated as [10]

(A-2)

In other words, a curve is called a principal curve if it minimizes the expected squared distance between and which is denoted by [11]

(A-3)

As an one-dimensional principal manifold, the principal curve has a wide applications (e.g. [36]) due to its simpleness. Following Hastie and Stuetzle [10], researchers afterwards have proposed a variety of principal curve definitions and learning algorithms to perform different tasks [26, 11, 29, 19]. But most of them tried to first approximate the principal curve first with a polyline [11] and then smooth it to meet the requirement for smoothness [10] of the principal curve. Therefore, the expression of the principal curve is not explicit and results in a ‘black-box’ which is hard to interpret. The other definitions of principal curves [30, 27]

employed Gaussian mixture model to generally formulate the principal curve which brings model bias and makes interpretation even harder. When the principal curve is used to perform a ranking task, it should be modeled to be a ‘white-box’ which can be well interpreted for its provided ranking lists.

Appendix B Proof of Theorem 2

If , is strictly monotone by Theorem 1. Regarding that the ranking candidates is totally ordered, there is a one-to-one correspondence between ranking items in and . Otherwise, and both hold for some . In this case, which contradicts the assumption holds for all .

By Lemma 1 and the one-to-one correspondence, there exists an inverse mapping such that . By strict monotonicity (Eq.(1)) and the one-to-one correspondence, we have

(B-1)

Thus, holds for .

Appendix C Proof of RPC Existence (Theorem 3)

Proof. Assume and denotes the set of all continuous function embracing all possible observations of . The uniform metric is defined as

(C-1)

It is easy to see is a complete metric space [21].

Let , where is the convex hull of . With the Frobenius norm, is a sequentially compact set so that for any given sequence in there exists a subsequence converging uniformly to an [21] with

(C-2)

Let and . Then we have a sequence converging uniformly to :

(C-3)
(C-4)
(C-5)

where . By Proposition 1, is a curve sequence of strictly monotonicity and converges to .

Assuming the converging sequence makes for fixed ,

(C-6)
(C-8)

and therefore

(C-9)

Finally, we complete the proof.

Appendix D Proof of Convergence (Proposition 2)

Proof: First of all, generated by Richardson method has been proved to converge [37]. Assume , and