How to choose the most appropriate centrality measure?

03/02/2020
by   Pavel Chebotarev, et al.
0

We propose a new method to select the most appropriate network centrality measure based on the user's opinion on how such a measure should work on a set of simple graphs. The method consists in: (1) forming a set F of candidate measures; (2) generating a sequence of sufficiently simple graphs that distinguish all measures in F on some pairs of nodes; (3) compiling a survey with questions on comparing the centrality of test nodes; (4) completing this survey, which provides a centrality measure consistent with all user responses. The developed algorithms make it possible to implement this approach for any finite set F of measures. This paper presents its realization for a set of 40 centrality measures. The proposed method, called culling, can be used for rapid analysis or combined with a normative approach by compiling a survey on the subset of measures that satisfy certain normative conditions (axioms). In the present study, the latter was done for the subsets determined by the Self-consistency or Bridge axioms.

READ FULL TEXT VIEW PDF

Authors

page 12

page 18

02/01/2021

Characterizing and comparing external measures for the assessment of cluster analysis and community detection

In the context of cluster analysis and graph partitioning, many external...
02/20/2022

Dissecting graph measure performance for node clustering in LFR parameter space

Graph measures that express closeness or distance between nodes can be e...
03/15/2021

Enclosing Depth and other Depth Measures

We study families of depth measures defined by natural sets of axioms. W...
12/09/2015

Learning measures of semi-additive behaviour

In business analytics, measure values, such as sales numbers or volumes ...
01/22/2022

Good Classification Measures and How to Find Them

Several performance measures can be used for evaluating classification r...
12/07/2020

Centrality of nodes in Federated Byzantine Agreement Systems

The federated Byzantine agreement system (FBAS) is a consensus model int...
06/04/2014

A Methodology for Empirical Analysis of LOD Datasets

CoCoE stands for Complexity, Coherence and Entropy, and presents an exte...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Within the last decades, more than 250 network centrality measures have been proposed [1]. This gave rise to the problem of choosing the most appropriate centrality measures for specific applications.

In some cases, this problem has a straightforward solution. These are the cases where scholars have a detailed mathematical model of the process of interest that drives changes in the network and depends on the comparative influence of nodes. Then a certain measure of node influence may appear naturally from the equations of the model. Such a measure can often be interpreted in terms of centrality [71, 41, 67].

However, in many cases, no detailed model is available, while there is a need to measure centrality of network nodes. Then centrality measures can be compared experimentally by studying correlations between them and important external characteristics of real or simulated networks [16, 63, 12, 59, 4]. The results of such studies can vary from one dataset to another, and they do not reveal the underlying causes of the correlation.

That is why a good supplement to this perspective is another one that focuses on the inherent properties of centrality measures and the conditions they satisfy. A quintessence of this is the axiomatic approach, which allows one, in some cases, to determine a unique measure that satisfies a series of axioms that seem desirable (we refer to [44, 70, 56, 42, 49, 36, 72, 68] for several examples).

Despite the strength of the axiomatic approach, it has certain limitations. Let us discuss some of them.

1. At the moment, only a minority of measures are equipped with their axiomatics. The rate at which new measures appear is much faster than providing them with axiomatics. The reason is that constructing a set of axioms that characterizes a specific measure of centrality is not an easy task at all.

2. In some cases, we have a parametric family of measures with a real parameter (see, e.g., [3, 34, 35]). Even if such a family has been characterized, the problem of choosing the value of the parameter still remains. This problem can rarely be solved axiomatically.

3. In many axiomatics, there is at least one technical axiom, which is not attractive on its own and rather determines the functional form of a centrality measure (e.g., Closeness, Degree, or Decay axioms in [42], Linearity in [36], Neighbor separability in [53]). It can be argued that adopting such an axiom does not fundamentally differ from adopting a centrality measure itself.

4. Some axioms have rather sophisticated formulations, which makes it difficult to assess their desirability.

5. A user who compares several axiomatics may feel them equally attractive, although they can lead to quite different measures. Studying and comparing, say, thirty axiomatics can really take a lot of time with no guarantee that a user ultimately prefers one of them to all the others.

6. It is not always obvious from a set of axioms how the unique measure that satisfies them ranks the nodes in a simple network by their centrality. Such rankings may turn out to be actually counterintuitive for the user, even though the corresponding axioms looked attractive.

For example, the recently proposed ‘Weighted degree centrality’ [10]

where is the degree of node , being the shortest path distance between and implements the rational idea of measuring the centrality of by the sum of the degrees of all other nodes with weights decreasing with distance from  This measure satisfies four out of six axioms considered in [10]

(while ‘Betweenness’ or ‘Eigenvector centrality’ satisfy three axioms and no measure under study satisfies five or six) and its characterization can be completed by adding a proper arithmetic axiom. At the same time, for any star with

(a graph with nodes and

edges incident to the same node called the center), this index makes the center less central than the leaves. This means that (1) the ‘Weighted degree’ measures something other than centrality and (2) it is dangerous to chose a centrality measure based on heuristics and the number of conditions it meets.

The initial assumption of this study is that a specialist is able to rank the nodes of very simple graphs by their centrality from the point of view of a specific application. In this case, we can try to offer him or her a centrality measure that ranks nodes according to the specialist’s preferences. In some cases, it can be selected from a set of measures that meet the normative conditions the user considers most important.

As another example, consider the well-known ‘PageRank centrality’ [20, 52, 58], which has several axiomatizations [61, 5, 72]. Nevertheless, it sets (node 4 is more central than node 5) for the graph in Fig. 1. Some other results it provides (i.e., in ; in ; and in ) also seem rather counterintuitive, and it is not easy to specify a real-world application that would require such results. Such peculiarities of centrality measures are likely to go unnoticed in an axiomatic study, but they appear immediately if we test measures on a series of simple graphs.

Taking into account the above limitations of the axiomatic strategy, in this paper we propose an alternative approach, which is based on the user’s opinion on how a centrality measure should work on a set of test graphs.

2 Selection of measures using test graphs

Suppose we have a finite set of centrality measures . Given an undirected graph with set of nodes and set of edges , every measure attaches a real number to each node . The number is interpreted as the centrality of in assigned by : the more the more central is considered. In what follows, denotes any centrality measure. When there is a need to specify explicitly, we write  Since there are centrality measures that are only defined for connected graphs, we restrict our consideration to such graphs; moreover, we assume that .

We say that a graph distinguishes measures and on a pair of nodes if and only if

(1)

where

In words, (1) means that and disagree in comparing the centrality of and 

Centrality measures and are rank equivalent if and only if there is no graph that distinguishes and Suppose that there are no distinct rank equivalent measures in  This assumption reflects our interest in the order of centrality values rather than in the numerical values themselves.

We consider the following task. Propose a tool that allows the user to choose a centrality measure based on their intuitive understanding (related to a particular application) of how such a measure should work on a collection of simple graphs.

To do so, we need:

  • a collection of simple graphs that contains a distinguishing graph for every pair of distinct measures

  • a questionnaire such that every set of answers to its questions determines a unique measure.

Ideally, these collection and questionnaire should be generated (or updated) automatically whenever a new set is given or an existing is updated.

The distinguishing graphs should be simple enough for users to answer which pairwise centrality order, i.e., or or is most consistent with their intuitive understanding.

Selection of measures using test graphs will be called culling.

3 Generating graphs that distinguish measures

Let a set of centrality measures be given; we suppose that . A goal is a sequence of graphs that distinguish all measures in 

For our first survey, we opted for unicyclic graphs (also called unicycles, 1-trees, or augmented trees). These are connected graphs with the number of edges equal to the number of nodes. Such a graph can be obtained from a tree by adding one edge.

The advantage of 1-trees is that they are very simple and perhaps even more comfortable to perceive than trees, thanks to an eye-catching cycle (see Fig. 1).

At the beginning, the current sequence is empty.

Now we generate unlabeled trees, starting from the tree with three nodes and increasing the number of nodes. For each tree, we produce various 1-trees by adding different edges. For each 1-tree , we look for the pairs such that the current sequence (let ) still contains no graph that distinguishes and while distinguishes them on some pair of its nodes. In this case, we add to the sequence as and associate a triple with the pair . We stop the generation process when such a triple is associated with every pair of distinct centrality measures in .

It should be noted that the procedure for generating distinguishing graphs allows for variations. Say, for the first survey, we actually considered only 1-trees with node degrees less than five, and this restriction can be removed. Moreover, one can also include trees in as well as some graphs with the number of edges exceeding the number of nodes. In the latter case, the graphs should remain simple enough for the user to visually compare the centrality of the nodes based on expert opinion.

The next task is constructing a questionnaire that allows the user to choose the centrality measure that suits them best.

4 Constructing a decision tree

In this section, we present an algorithm for constructing a questionnaire, which takes the form of a decision tree.

In each question, the user is asked to compare the centrality of two nodes: and in some graph  This graph distinguishes (on some pair of its nodes) certain measures that satisfy all the conditions (answers) obtained earlier. The user’s answer is or or and it narrows the set of suitable measures by preserving only those for which the answer is true. The survey continues until only one measure remains.

The questions in the survey generally depend on the answers to the previous ones. So the questionnaire has the structure of a rooted directed tree. The user navigates the tree answering the questions attached to the root and intermediate vertices. Each answer is identified with an arc directed from the vertex corresponding to the question. The leaves, i.e., the vertices of the decision tree that have no outgoing arcs, are identified with the resulting measures.

We now present an algorithm for constructing this directed tree. We have a set () of centrality measures, a sequence of distinguishing graphs and, for every pair of distinct measures a triple is associated with this pair such that distinguishes and on the pair of its nodes

At the initial step of the algorithm, we are at the root of the tree. There are no other vertices in the tree, and no question is attached to the root.

On each step of the algorithm, we are at some vertex of the tree. A standard step of the algorithm is any step, except for finish.

The standard step consists of the following actions.

Suppose we are at vertex of the rooted directed tree. Consider the path from the root to  Each arc of this path corresponds to some condition: it is an answer to the question attached to the vertex this arc is directed from. Let be the set of measures satisfying all conditions in this path (if is the root, then ).

  • If no question has yet been attached to and , then take with the smallest such that distinguishes some measures in on a certain pair of its nodes Choose a suitable pair arbitrarily and attach the question ‘‘What inequality (equality) holds true for and in ?’’ to  Draw two or three arcs directed from to newly created vertices, depending on which of the three conditions or are met for any measures in  Assign the conditions that can be satisfied by any measures to these arcs. Mark these arcs ‘‘new.’’

  • If there is at least one ‘‘new’’ arc directed from choose any such arc, move to the vertex this arc is directed to, and mark this arc ‘‘old.’’

  • If then attach to the unique centrality measure that belongs to

  • If (a) there is no ‘‘new’’ arc directed from while is attached a question and is not the tree root or (b)  then make one move back from toward the root.

  • If there is no ‘‘new’’ arc directed from while is attached a question and is the root, then finish: the decision tree is built. Otherwise, make another standard step.

A high-level pseudocode of the standard step is shown in Algorithm 1.

Data: a set of centrality measures; a list of distinguishing graphs; a set of distinguishing triples
Result: Decision tree for choosing a centrality measure
Function standard_step()
        if  then
               distinguish_measures .attach_question(‘‘ vs in ’’) .add_new_arcs_and_childs while .has_new_arcs() do
                     arc .get_any_new_arc() mark_arc_as_old(arc) standard_step(arc.target)
               end while
              
       else
              x.attach_unique_measure
        end if
       
end
Algorithm 1 Recursive construction of a decision tree

5 Extending a decision tree

Suppose now that several new measures (for example, appeared in the literature over the past year) have been added to  Do we have to rebuild the questionnaire from the very beginning? The answer is no.

Consider the simplest case where one measure is added. There are two possibilities:

  • the new measure satisfies the same set of conditions (answers to the questions of the current questionnaire) as one of the ‘‘old’’ measures;

  • in the decision tree of the current survey applied to the extended set there is a vertex such that but only two arcs are directed from while the third possible answer to the question attached to is true for 

In case (b), the only thing to do is add the third arc directed from i.e., add the third possible answer to the question attached to after which the new questionnaire is ready.

In case (a), we need to distinguish and the ‘‘old’’ measure that satisfies the same set of conditions. This means that in the current survey, and end up on the same leaf of the decision tree. Therefore, we have to make this leaf an intermediate vertex by asking an additional question. To formulate this question, we first check if contains a graph that distinguishes and on some pair of its nodes If such a graph exists, then the required question is: ‘‘What inequality (equality) holds true for and in ?’’ Otherwise, we first have to generate a new graph that distinguishes and so we call the graph generation procedure described in Section 3.

Thus, adding one measure leads to adding one question or one answer to an existing question. This proves, among others, that the total number of questions in the questionnaire is no more than where is the number of measures. If three arcs are directed from some vertices of the tree, then the number of questions is smaller by the number of such vertices. Of course, the number of questions a user has to answer in a particular survey is usually much less.

Suppose now that several measures have been added to  Then we need:

  1. For each new measure check the above condition (b); if it is satisfied, then add the lacking arc to the decision tree as for the case (b);

  2. Attach new measures to the existing leaves, as for the above case (a);

  3. Check whether the graphs in distinguish the measures attached to the same leaf and add the lacking questions whenever such graphs exist;

  4. For the pairs of measures that cannot be distinguished by any graphs in generate new graphs as described in Section 3 and add the required questions.

6 A test survey

Before presenting a test survey, we introduce several new centralities. They are based on similarity/dissimilarity measures for network nodes.

Figure 1: Distinguishing graphs for a test set of 40 centrality measures.

6.1 Centralities based on graph kernels

Two popular333Observe that in the recent study [4], the authors come to the conclusion that in the infection source identification problem ‘‘a combination of eccentricity and closeness… generally performs better than several state-of-the-art source identification techniques, with higher accuracy and lower average hop error.’’ centrality measures are ‘Closeness’ [13, 14]

(2)

and ‘Eccentricity’ [13, 43]

(3)

where is the shortest path distance [21] between nodes and in 

Consider the centrality measures obtained by substituting other graph distances and dissimilarity measures [26, 7] into (2) and (3). These measures are defined via graph kernels.

The parametric family of Katz kernels [47] (also referred to as Walk proximities [32] and Neumann diffusion kernels [38]) is defined as

with , where is the adjacency matrix of and is the spectral radius of 

The Communicability kernels [37, 39] are

Two other families of kernels are defined similarly via the Laplacian matrix of :

where and

is the diagonal matrix with vector

on the diagonal.

The regularized Laplacian kernels, or Forest kernels [28] are

The Heat kernels are the Laplacian exponential diffusion kernels [50]

By Schoenberg’s theorem [65, 66], if matrix is positive semidefinite (i.e., is a kernel), then it generates a squared Euclidean distance by means of the transformation

(4)

On the other hand, if produces a proximity measure (viz., for any , and the inequality is strict whenever and ), then [31] transformation (4) generates a distance function that satisfies the axioms of a metric. All the Forest kernels produce proximity measures, while the kernels in the remaining three families do so when is sufficiently small [7].

Moreover, if represents a strictly positive transitional measure on  [24, 26], (i.e., for all nodes and while if and only if every path in from to visits ), then transformation

produces a proximity measure. In this case, (4) applied to generates [26] a cutpoint additive distance viz., such a distance that if and only if is a cutpoint between and in (which means that all paths connecting and visit ).

The Walk/Katz and Forest kernels represent [24] strictly positive transitional measures on the corresponding connected graph. Therefore, (4) applied to the logarithmically transformed Walk and Forest kernels (the so called logarithmic kernels) produce cutpoint additive distances.

Based on the above results, we define ‘Closeness’ and ‘Eccentricity’ centralities obtained by substituting the:

  • Forest kernel;

  • Heat kernel;

  • logarithmic Forest kernel;

  • logarithmic Walk kernel;

  • logarithmic Heat kernel;

  • logarithmic Communicability kernel

transformed by (4) and then square-rooted into (2) and (3). These centralities are included in the test survey discussed below. We set for the Forest, Heat, and Communicability kernels and for the Walk kernel.

It should be stressed that the above measures are just examples of kernel based centralities. There are other kernels and transformations [38, 7] that can also be used in a similar way. Say, every distance can be integrated in the -Means framework [35].

On the other hand, kernels and similarity/proximity measures can produce centralities directly, without transforming them into distances. An example is the Estrada subgraph centrality [37], which is simply the Communicability kernel diagonal entry corresponding to a node. Along with this measure, we consider the diagonal entry of the Walk kernel, as a centrality of node (this measure is referred to as ‘Walk ()’) and also as centralities of node  (‘Walk ()’, ‘Communicability ()’). The ‘Total communicability’ [15] is

The potential infinity of types and modifications of centrality measures together with the actual continuality of parametric families of them emphasizes the need for a powerful tool for comparing and discriminating centralities.

6.2 A survey with 40 centralities

To test the proposed approach, let us consider a set of 40 centrality measures. These are:

  1. Betweenness [40]; 21. Bonacich [18];
  2. Closeness [13, 14]; 22. Total communicability [15];
  3. Connectivity [48]; 23. Communicability  [37];
  4. Connectedness power [53]; 24. Walk  [47];
  5. Degree [13, 62]; 25. Walk  [47];
  6. Coreness [9]; 26. Estrada [37];
  7. Bridging [19]; 27. Eigencentrality based on Jaccard dissimilarity [6];
  8. PageRank [20]; 28. Eigencentrality based on Dice dissimilarity [6];
  9. Harmonic closeness [54]; 29. Closeness (Forest [27]);
10. Eccentricity [13, 43]; 30. Closeness (Heat [50]);
11. -Means  [35]; 31. Closeness (logarithmic Forest [23]);
12. -Means  [35]; 32. Closeness (logarithmic Walk [25]);
13. -Means  [35]; 33. Closeness (logarithmic Heat [50, 26]);
14. Beta current flow [8]; 34. Closeness (logarithmic Communicability [46]);
15. Weighted degree [10]; 35. Eccentricity (Forest [27]);
16. Decaying degree [10]; 36. Eccentricity (Heat [50]);
17. Decay [69]; 37. Eccentricity (logarithmic Forest [23]);
18. Generalized degree [34]; 38. Eccentricity (logarithmic Walk [25]);
19. Katz [47]; 39. Eccentricity (logarithmic Heat [50, 26]);
20. Eigenvector [51, 17]; 40. Eccentricity (logarithmic Communicability [46]).

The list of distinguishing graphs consists of 13 graphs presented in Fig. 1.

The questionnaire consisted of 33 questions totally. The length of a particular survey ranges from 1 to 12, with an average length of per measure.

The survey begins with the question about the centrality of vertices 1 and 2 in (Fig. 2). If the user chooses option then the second question is on the centrality of vertices 0 and 1 in the same graph. If the answer is then the third question is about the centrality of vertices 0 and 4 in . Alternatively, in case of answer the user is asked to compare and in whereas in case of answer the third question is on the centrality of vertices 0 and 3 in . If the answer to the first question is then the survey completes with the result ‘Weighted degree’ [10] centrality measure: it is the only measure in that meets this condition.

Thus, the beginning of the decision tree is the rooted tree shown in Fig. 2. The whole tree is presented in Fig. 3 and Fig. 4.

Figure 2: The beginning of the decision tree for the test set of 40 measures.
Figure 3: The decision tree for the test set of 40 measures.
Figure 4: Subfigures A to D of Fig. 3.

A survey based on this decision tree was implemented using Google Forms [2], which allows anyone to choose the most appropriate centrality measure. One of the questions is shown in Fig. 5. The message attached to each question is: ‘‘Choose the answer that suits your feeling or matches your professional judgment best… It would be optimal if you keep in mind your specific application while making your choice.’’ Statistics on selection results from a significant number of users will be of particular interest.

Figure 5: A sample survey form in [2].

7 Combining culling with a normative approach

It can be observed that some questions in the above test survey are quite simple. Say, comparison of the centrality of nodes 1 and 2 in (Fig. 2) should be clearly in favor of 2. Therefore, the first question along with the measure ‘Weighted degree’ assigning equal centrality to nodes 1 and 2 can be eliminated, which reduces the number of questions in each particular survey by one.

The second question regarding the comparison of and is also fairly simple. Indeed, the answer

can hardly be imagined, and most users would probably prefer

Based on this, the measures ‘Eigencentrality based on Dice dissimilarity’ and ‘Eigencentrality based on Jaccard dissimilarity’ can be eliminated as they set The answer is unlikely to be very popular, however, the list of measures contains nine centralities that support it. These are the well-known ‘Bridging’ and ‘Betweenness’ measures and also the ‘Eccentricity’ measures based on the Shortest path, Forest, Heat, logarithmic Forest, logarithmic Heat, logarithmic Walk, and logarithmic Communicability proximities. The present survey makes the user realize that choosing any of these measures implies adopting that nodes 0 and 1 in have equal centrality values. Furthermore, the subsequent question (comparing nodes 0 and 2 in ) reveals that ‘Bridging’ states that which is not very easy to accept; leads to ‘Eccentricity’; the remaining seven measures set

The probably most popular answer, to the second question leads to the comparison of and in  The rather specific answer leads to the famous ‘PageRank’ measure (this was mentioned444 contains ‘PageRank ,’ however, the properties listed in Section 1 hold for all in Section 1), to ‘Degree’ or ‘Coreness’, while to the remaining 25 measures in the list (which are distinguished by the answers to the next questions).

The presence of fairly simple questions in the survey suggests to formulate several normative conditions on (desirable properties of) centrality measures. Such a list of conditions can be presented to the user and if he or she agrees with some of them, this can dramatically reduce the set of measures. After that, you can offer the user an individual survey that only involves the ‘‘surviving’’ measures.

This leads to a combination of culling with a normative approach. In the following section, we discuss a couple of conditions that can be used in such a combination.

8 Normative conditions and abbreviated surveys

Among the axioms used in the literature to characterize centralities, the most attractive are conditions of ordinal nature. Such axioms allow us to compare the centrality of certain nodes, but they do not provide any suggestions on how to calculate centrality in the general case. Therefore, these axioms are not fingerprints of specific centrality measures.

In this section, we consider two such axioms and demonstrate how they can help replace the complete survey with a short one.

First, one can believe that the centrality of a node in a fixed graph should be largely determined by the vector of centrality values of its neighbors (adjacent nodes); cf. Consistency in [36]. A refinement of this requirement is that the greater the centrality values of the neighbors of a node the greater the centrality of the node itself.

The following axiom is based on this idea. In the case of directed graphs, it appeared in [29, 33]; for undirected graphs, in [10, 11] under the name of Structural consistency. It is a strengthening of Preservation of neighborhood-inclusion [64], whose directed version goes back to Preservation of cover relation [55].

Let denote the set of neighbors of node in 

Self-consistent monotonicity.  If there is an injection from to such that every element of is, according to no more central than the corresponding element of , then If additionally or ‘‘no more’’ is actually ‘‘less’’ at least once then

Let us consider a weaker axiom, which also turns out to be quite strong (cf. [30]).

Self-consistency.  If there is a bijection from to such that every element of is, according to no more central than the corresponding element of , then If ‘‘no more’’ is actually ‘‘less’’ at least once, then

The idea of the second axiom [68] is quite different. It allows us to compare the centralities of two endpoints of a bridge.

Bridge axiom.  If edge is a bridge in i.e., the removal of from separates into two connected components (with node sets and ), then

A strengthening of this axiom is Ratio property in [48], which states, under the same premise, that where

Figure 6: (a) Caterpillar ; (b) the Renaissance Florentine families marriage network [60].
  Centrality measures Axioms
Self-consistency Bridge axiom
  1. Betweenness
  2. Closeness
  3. Connectivity
  4. Connectedness power
  5. Degree
  6. Coreness
  7. Bridging
  8. PageRank
  9. Harmonic closeness
10. Eccentricity
11. -Means
12. -Means
13. -Means
14. Beta current flow
15. Weighted degree
16. Decaying degree
17. Decay
18. Generalized degree
19. Katz
20. Eigenvector
21. Bonacich ()
22. Total communicability ()
23. Communicability
24. Walk
25. Walk
26. Estrada
27. Eigencentrality (Jaccard dissimilarity)
28. Eigencentrality (Dice dissimilarity)
29. Closeness (Forest, )
30. Closeness (Heat, )
31. Closeness (log Forest, )
32. Closeness (log Walk, )
33. Closeness (log Heat, )
34. Closeness (log Communicability, )
35. Eccentricity (Forest, )
36. Eccentricity (Heat, )
37. Eccentricity (log Forest, )
38. Eccentricity (log Walk, )
39. Eccentricity (log Heat, )
40. Eccentricity (log Communicability, )
Table 1: 40 centrality measures and ‘’ if a measure satisfies an axiom, ‘’ if it satisfies after a slight modification of the measure, and ‘

’ if not. In the latter case, a graph and an ordered pair of nodes are indicated for which the axiom is violated.

Table 1 presents the results of verification of Self-consistency and the Bridge axiom for the 40 centrality measures under consideration. It turns out that only four measures in our set satisfy Self-consistency. Five other measures satisfy the Bridge axiom, including two measures that do so after a slight modification. When a measure violates an axiom, a record of the form ‘’ indicates that the axiom is violated, among others, for the ordered pair of nodes in graph  Graphs to are shown in Fig. 1; two additional graphs appearing in Table 1, and are presented in Fig. 6.

Consider the positive results in Table 1.

Proposition 1 ([22]).

The centrality measures Generalized degree,’ Katz,’ Eigenvector,’ and Bonacich satisfy Self-consistency.

To formulate the second result, consider a slight modification of ‘Closeness (log Forest)’ and ‘Closeness (log Walk).’ As described in Subsection 6.1, these centralities are obtained by substituting the logarithmic Forest kernel and the logarithmic Walk kernel, respectively, transformed by (4) and then square-rooted into (2). Let ‘Closeness* (log Forest)’ and ‘Closeness* (log Walk)’ be similar measures for which the results of transformation (4) are substituted into (2) without square-rooting.

As the Forest kernel and Walk kernel represent [24] strictly positive transitional measures on connected graphs, (4) transforms the corresponding logarithmic kernels into cutpoint additive distances viz., such distances that if and only if is a cutpoint between and in [26] (which means that all paths connecting and visit ). The latter turns out to be the key property that implies the fulfilment of the Bridge axiom. Similarly, other strictly positive transitional measures [24] and cutpoint additive distances produce centralities that satisfy the Bridge axiom.

Proposition 2 ([22]).

The centrality measures Betweenness,’ Closeness,’ Connectivity,’ Closeness* log Forest,’ and Closeness* log Walk satisfy the Bridge axiom.

Suppose that the user believes that a good centrality measure must satisfy Self-consistency or the Bridge axiom. This leads to dramatic reduction of the set of centralities and to the corresponding reduction of the culling survey needed to determine the most appropriate centrality measure.

The surveys on the sets of centralities that satisfy Self-Consistency or the Bridge axiom are shown in Fig. 7 and Fig. 8, respectively.

Figure 7: A survey for the four measures satisfying Self-consistency.
Figure 8: A survey for the five measures satisfying the Bridge axiom.

The first one can be extracted from the general survey shown in Figures 3 and 4. The second survey involves graphs that distinguish ‘Closeness* (log Walk)’ and ‘Closeness* (log Forest)’ from each other and from the other measures that satisfy the Bridge axiom.

Some questions of the survey in555The last question ‘‘0 vs 1 in ’’ can be replaced by ‘‘2 vs 6 in ’’ with leading to ‘Katz’ and leading to ‘Bonacich.’ Fig. 7 can be not very easy to answer. This is because the measures satisfying Self-consistency are close relatives, so that the difference between them is quite subtle.

9 Discussion

In this paper, we proposed a new method called culling to select the most appropriate network centrality measure based on the user’s opinion on how such a measure should work on a set of simple test graphs. The method consists of the following steps:

  • Forming a finite set of candidate measures, among which there are no rank equivalent measures.

  • Generating a sequence of sufficiently simple graphs that distinguish all measures in on some pairs of nodes. For the test survey on 40 centralities, we opted for 1-trees and they distinguished all measures, however, distinguishing additional measures may require graphs of other types.

  • Compiling a survey with questions on comparing the centrality of test nodes.

  • Filling out this survey. Being a decision tree, it provides a centrality measure consistent with all user responses.

The developed algorithms make it possible to implement this approach for any finite set of measures. In this paper, it was realized for a set of 40 centrality measures, including several kernel based closeness and eccentricity measures introduced in Section 6.

It can be observed (see Fig. 4) that the resulting decision tree contains several subtrees whose leaves correspond to closely related centralities. The reason is that the survey begins (according to the algorithm in Section 4

) with the simplest graphs, while subtle differences between closely related measures are distinguished by more complex graphs that appear in the survey later. As a result, the decision tree can be considered as a rough tool for hierarchical clustering of the set

of centralities.

The most promising application of the proposed culling method is to combine it with a normative approach by compiling surveys on the subsets of measures that satisfy certain normative conditions (axioms). The paper presents such sub-surveys for the measures that obey the Self-consistency or Bridge axioms.

A culling survey implemented on the universum of centrality measures provides a method for rapid analysis. If the user only gives answers in which they are confident, then this narrows the set of measures. Such a reduced set can be subjected to experimental study on real networks or checked for compliance with certain normative conditions.

Using culling alone with a universal set of input measures and a unique output measure does not seem to be a very reliable approach. The reason is that the corresponding survey may contain, among others, non-trivial questions. If the user answers all of them, then the reliability of the answers can be insufficient, making the result unstable. It should be realized that the only information about a centrality measure that is involved in culling is how it works on the distinguishing graphs. This information may not be sufficient, because the results of the same measure on other graphs may be surprising.

Thus, the answer to the question in the title of the paper can be: use a combined approach that involves checking normative conditions, culling, experimental studies, and analysis of heuristics that underlie the proposed measures. Among these components, culling may be the least demanding / time consuming one. It is based on expert opinion, while its algorithmic part is simple and transparent.

Moreover, culling allows us to reveal unexpected features of certain centrality measures. We have already considered some peculiarities of the ‘PageRank’ centrality in Section 1. As another example consider now ‘Bridging’ centrality [19, 45]. The user who prefers it must adopt that this measure (see Fig. 1) sets in in in and in in in and in 

In general, a list of rankings a measure offers on the distinguishing graphs forms its profile, which helps the user make a decision regarding its approval or rejection.

This study leaves a number of topics for future work. Let us list some of them.

  • Extend the culling approach to a larger set of measures.

  • Collect statistics of the survey results from various users. This would provide a kind of popularity rating for centrality measures.

  • Compile surveys for the subsets of measures that satisfy the most important ordinal normative conditions, such as monotonicity conditions and their combinations.

  • Consider a wider (than 1-trees) class of possible distinguishing graphs, for instance, the one including trees.

  • Transfer the culling approach to centrality measures for directed graphs.

  • Consider variations of the proposed algorithm for constructing culling trees based on the corresponding theory [57].

  • Make the culling surveys more flexible by (a) allowing some questions to be skipped; (b) involving not only one, but all graphs in that distinguish certain pairs of measures.

  • Create a web application that enables one to compile and complete culling surveys for any subclasses of centrality measures.

  • Apply the culling approach to the problem of choosing the scoring methods for unbalanced tournaments [29, 33].

Acknowledgement

The authors thank Anna Khmelnitskaya for helpful discussions.

References

  • [1] Centiserver the most comprehensive centrality resource and web application for centrality measures calculation. https://www.centiserver.org/?q1=centrality.
  • [2] Choosing the most appropriate centrality measure: An interactive sample web survey. https://docs.google.com/forms/d/e/1FAIpQLSfhrzId4S0hQVvn96zfIpOko4_S4bdsgUKeHtYKFIq6JZ9DjQ/viewform.
  • [3] F. Agneessens, S. P. Borgatti, and M. G. Everett. Geodesic based centrality: Unifying the local and the global. Social Networks, 49:12–26, 2017.
  • [4] S. S. Ali, T. Anwar, and S. A. M. Rizvi. A revisit to the infection source identification problem under classical graph centrality measures. Online Social Networks and Media, 2020. Available online 11 February 2020. DOI: 10.1016/j.osnem.2020.100061.
  • [5] A. Altman and M. Tennenholtz. Ranking systems: the PageRank axioms. In Proceedings of the 6th ACM Conference on Electronic Commerce, pages 1–8. ACM, 2005.
  • [6] A. Alvarez-Socorro, G. Herrera-Almarza, and L. González-Díaz. Eigencentrality based on dissimilarity measures reveals central nodes in complex networks. Scientific Reports, 5:17095, 2015.
  • [7] K. Avrachenkov, P. Chebotarev, and D. Rubanov. Similarities on graphs: Kernels versus proximity measures. European Journal of Combinatorics, 80:47–56, 2019.
  • [8] K. E. Avrachenkov, V. V. Mazalov, and B. T. Tsynguev. Beta current flow centrality for weighted networks. In 4th International Conference on Computational Social Networks, CSoNet 2015, LNCS , pages 216–227. Springer, 2015.
  • [9] J. Bae and S. Kim. Identifying and ranking influential spreaders in complex networks by neighborhood coreness. Physica A: Statistical Mechanics and Its Applications, 395:549–559, 2014.
  • [10] S. Bandyopadhyay, M. N. Murty, and R. Narayanam. A generic axiomatic characterization of centrality measures in social network. arXiv preprint arXiv:1703.07580v1, 2017.
  • [11] S. Bandyopadhyay, R. Narayanam, and M. N. Murty. A generic axiomatic characterization for measuring influence in social networks. In

    24th International Conference on Pattern Recognition

    ICPR–
    , pages 2606–2611. IEEE, 2018.
  • [12] K. Batool and M. A. Niazi. Towards a methodology for validation of centrality measures in complex networks. PloS ONE, 9(4), 2014.
  • [13] A. Bavelas. A mathematical model for group structures. Applied Anthropology, 7(3):16–30, 1948.
  • [14] A. Bavelas. Communication patterns in task-oriented groups. The Journal of the Acoustical Society of America, 22(6):725–730, 1950.
  • [15] M. Benzi and C. Klymko. Total communicability as a centrality measure. Journal of Complex Networks, 1(2):124–149, 2013.
  • [16] J. M. Bolland. Sorting out centrality: An analysis of the performance of four centrality models in real and simulated networks. Social Networks, 10(3):233–253, 1988.
  • [17] P. Bonacich. Factoring and weighting approaches to status scores and clique identification. Journal of Mathematical Sociology, 2(1):113–120, 1972.
  • [18] P. Bonacich. Power and centrality: A family of measures. American Journal of Sociology, 92(5):1170–1182, 1987.
  • [19] R. Breitling, P. Armengaud, A. Amtmann, and P. Herzyk. Rank products: A simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Letters, 573(1-3):83–92, 2004.
  • [20] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7):107–117, 1998.
  • [21] F. Buckley and F. Harary. Distance in Graphs. Addison-Wesley, Redwood City, CA, 1990.
  • [22] P. Chebotarev. The the classes of centralities characterized by the self-consistency or bridge axioms. Unpublished manuscript.
  • [23] P. Chebotarev. A class of graph-geodetic distances generalizing the shortest-path and the resistance distances. Discrete Applied Mathematics, 159(5):295–302, 2011.
  • [24] P. Chebotarev. The graph bottleneck identity. Advances in Applied Mathematics, 47(3):403–413, 2011.
  • [25] P. Chebotarev. The walk distances in graphs. Discrete Applied Mathematics, 160(10–11):1484–1500, 2012.
  • [26] P. Chebotarev. Studying new classes of graph metrics. In F. Nielsen and F. Barbaresco, editors, Proceedings of the SEE Conference ‘‘Geometric Science of Information’’ GSI–, LNCS 8085, pages 207–214, Berlin, 2013. Springer.
  • [27] P. Chebotarev and E. Shamis. The forest metrics for graph vertices. Electronic Notes in Discrete Mathematics, 11:98–107, 2002.
  • [28] P. Y. Chebotarev and E. Shamis. On the proximity measure for graph vertices provided by the inverse Laplacian characteristic matrix. In Abstracts of the Conference ‘‘Linear Algebra and its Applications’’, pages 6–7, Manchester, UK, 1995. University of Manchester. http://www.ma.man.ac.uk/~higham/laa95/abstracts.ps.
  • [29] P. Y. Chebotarev and E. Shamis. Constructing an objective function for aggregating incomplete preferences. In A. Tangian and J. Gruber, editors, Econometric Decision Models, Lecture Notes in Economics and Mathematical Systems, Vol. , pages 100–124. Springer, Berlin, 1997.
  • [30] P. Y. Chebotarev and E. Shamis. Characterizations of scoring methods for preference aggregation. Annals of Operations Research, 80:299–332, 1998.
  • [31] P. Y. Chebotarev and E. V. Shamis. On a duality between metrics and -proximities. Automation and Remote Control, 59(4):608–612, 1998.
  • [32] P. Y. Chebotarev and E. V. Shamis. On proximity measures for graph vertices. Automation and Remote Control, 59(10):1443–1459, 1998.
  • [33] P. Y. Chebotarev and E. V. Shamis. Preference fusion when the number of alternatives exceeds two: Indirect scoring procedures. Journal of the Franklin Institute, 336:205–226, 1999. Erratum, J. Franklin Inst., 1999, vol. 336, pp. 747–748.
  • [34] L. Csató. Measuring centrality by a generalization of degree. Central European Journal of Operations Research, 25(4):771–790, 2017.
  • [35] R. L. de Andrade and L. C. Rêgo. -means centrality. Communications in Nonlinear Science and Numerical Simulation, 68:41–55, 2019.
  • [36] V. Dequiedt and Y. Zenou. Local and consistent centrality measures in parameterized networks. Mathematical Social Sciences, 88:28–36, 2017.
  • [37] E. Estrada and J. A. Rodriguez-Velazquez. Subgraph centrality in complex networks. Physical Review E, 71(5):056103, 2005.
  • [38] F. Fouss, M. Saerens, and M. Shimbo. Algorithms and Models for Network Data and Link Analysis. Cambridge University Press, 2016.
  • [39] F. Fouss, L. Yen, A. Pirotte, and M. Saerens. An experimental investigation of graph kernels on a collaborative recommendation task. In Sixth International Conference on Data Mining (ICDM’), pages 863–868, 2006.
  • [40] L. C. Freeman. A set of measures of centrality based on betweenness. Sociometry, 40(1):35–41, 1977.
  • [41] N. E. Friedkin. Theoretical foundations for centrality measures. American Journal of Sociology, 96(6):1478–1504, 1991.
  • [42] M. Garg. Axiomatic foundations of centrality in networks. Technical report, Mimeo, Stanford University, 2009.
  • [43] F. Harary and R. Z. Norman. The dissimilarity characteristic of Husimi trees. Annals of Mathematics, pages 134–141, 1953.
  • [44] R. Holzman. An axiomatic approach to location on networks. Mathematics of Operations Research, 15(3):553–563, 1990.
  • [45] W. Hwang, T. Kim, M. Ramanathan, and A. Zhang. Bridging centrality: Graph mining from element level to group level. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 336–344, 2008.
  • [46] V. Ivashkin and P. Chebotarev. Do logarithmic proximity measures outperform plain ones in graph clustering? In V. A. Kalyagin, A. I. Nikolaev, P. M. Pardalos, and O. A. Prokopyev, editors, Models, Algorithms, and Technologies for Network Analysis, volume 197 of Springer Proceedings in Mathematics & Statistics, pages 87–105. Springer, 2016.
  • [47] L. Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):39–43, 1953.
  • [48] A. Khmelnitskaya, G. van der Laan, and D. Talman. Generalization of binomial coefficients to numbers on the nodes of graphs. Technical report, Tinbergen Institute Discussion Paper, No. 16-011/II, 2016.
  • [49] M. Kitti. Axioms for centrality scoring with principal eigenvectors. Social Choice and Welfare, 46(3):639–653, 2016.
  • [50] R. I. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete structures. In

    Proceedings of the 19th International Conference on Machine Learning

    , pages 315–322, 2002.
  • [51] E. Landau. Zur relativen Wertbemessung der Turnierresultate. Deutsches Wochenschach, 11:366–369, 1895.
  • [52] A. N. Langville and C. D. Meyer. Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, 2006.
  • [53] M. L. Mágó.

    Power Values and Framing in Game Theory

    .
    CentER, Tilburg University, 2018. Ph.D. Dissertation.
  • [54] M. Marchiori and V. Latora. Harmony in the small-world. Physica A: Statistical Mechanics and its Applications, 285(3-4):539–546, 2000.
  • [55] N. R. Miller. A new solution set for tournaments and majority voting: Further graph-theoretical approaches to the theory of voting. American Journal of Political Science, 24(1):68–96, 1980.
  • [56] H. Monsuur and T. Storcken. Centers in connected undirected graphs: An axiomatic approach. Operations Research, 52(1):54–64, 2004.
  • [57] S. K. Murthy. Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 2(4):345–389, 1998.
  • [58] M. Newman. Networks. Oxford University Press, 2 edition, 2018.
  • [59] S. Oldham, B. Fulcher, L. Parkes, A. Arnatkevičiūtė, C. Suo, and A. Fornito. Consistency and differences between centrality measures across distinct classes of networks. PloS ONE, 14(7), 2019.
  • [60] J. F. Padgett and C. K. Ansell. Robust action and the rise of the Medici, 1400–1434. American Journal of Sociology, 98(6):1259–1319, 1993.
  • [61] I. Palacios-Huerta and O. Volij. The measurement of intellectual influence. Econometrica, 72(3):963–977, 2004.
  • [62] C. H. Proctor and C. P. Loomis. Analysis of sociometric data. Research Methods in Social Relations, 2:561–585, 1951.
  • [63] R. B. Rothenberg, J. J. Potterat, D. E. Woodhouse, W. W. Darrow, S. Q. Muth, and A. S. Klovdahl. Choosing a centrality measure: Epidemiologic correlates in the Colorado Springs study of social networks. Social Networks, 17(3-4):273–297, 1995.
  • [64] D. Schoch. Centrality without indices: Partial rankings and rank probabilities in networks. Social Networks, 54:50–60, 2018.
  • [65] I. J. Schoenberg. Remarks to M.Fréchet’s article ‘‘Sur la définition axiomatique d’une classe d’espaces vectoriels distanciés applicables vectoriellement sur l’espace de Hilbert’’. Annals of Mathematics, 36:724–732, 1935.
  • [66] I. J. Schoenberg. Metric spaces and positive definite functions. Transactions of the American Mathematical Society, 44:522–536, 1938.
  • [67] M. Siami, S. Bolouki, B. Bamieh, and N. Motee. Centrality measures in linear consensus networks with structured network uncertainties. IEEE Transactions on Control of Network Systems, 5(3):924–934, 2018.
  • [68] O. Skibski and J. Sosnowska. Axioms for distance-based centralities. In

    Thirty-Second AAAI Conference on Artificial Intelligence

    AAAI–
    , pages 1218–1225, 2018.
  • [69] N. Tsakas. On decay centrality. The B.E. Journal of Theoretical Economics, 19(1):1–18, 2018.
  • [70] R. Vohra. An axiomatic characterization of some locations in trees. European Journal of Operational Research, 90(1):78–84, 1996.
  • [71] Z. Wang, A. Scaglione, and R. J. Thomas. Electrical centrality measures for electric power grid vulnerability analysis. In 49th IEEE Conference on Decision and Control CDC, pages 5792–5797. IEEE, 2010.
  • [72] T. W
    ˛
    as and O. Skibski.
    An axiomatization of the eigenvector and Katz centralities. In Thirty-Second AAAI Conference on Artificial Intelligence AAAI–, pages 1258–1265, 2018.