Identifying Influential Nodes in Two-mode Data Networks using Formal Concept Analysis

Identifying important actors (or nodes) in a two-mode network often remains a crucial challenge in mining, analyzing, and interpreting real-world networks. While traditional bipartite centrality indices are often used to recognize key nodes that influence the network information flow, they frequently produce poor results in intricate situations such as massive networks with complex local structures or a lack of complete knowledge about the network topology and certain properties. In this paper, we introduce Bi-face (BF), a new bipartite centrality measurement for identifying important nodes in two-mode networks. Using the powerful mathematical formalism of Formal Concept Analysis, the BF measure exploits the faces of concept intents to identify nodes that have influential bicliques connectivity and are not located in irrelevant bridges. Unlike off-the shelf centrality indices, it quantifies how a node has a cohesive-substructure influence on its neighbour nodes via bicliques while not being in network core-peripheral ones through its absence from non-influential bridges. Our experiments on several real-world and synthetic networks show the efficiency of BF over existing prominent bipartite centrality measures such as betweenness, closeness, eigenvector, and vote-rank among others.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

10/21/2021

Detecting Important Patterns Using Conceptual Relevance Interestingness Measure

Discovering meaningful conceptual structures is a substantial task in da...
02/05/2019

Detecting Local Community Structures in Social Networks Using Concept Interestingness

One key challenge in Social Network Analysis is to design an efficient a...
06/12/2018

Bipartite graph analysis as an alternative to reveal clusterization in complex systems

We demonstrate how analysis of co-clustering in bipartite networks may b...
12/20/2018

A spectral method for bipartizing a network and detecting a large anti-community

Relations between discrete quantities such as people, genes, or streets ...
11/28/2019

Addressing Time Bias in Bipartite Graph Ranking for Important Node Identification

The goal of the ranking problem in networks is to rank nodes from best t...
06/13/2020

Top influencers can be identified universally by combining classical centralities

Information flow, opinion, and epidemics spread over structured networks...
12/12/2012

Bayesian one-mode projection for dynamic bipartite graphs

We propose a Bayesian methodology for one-mode projecting a bipartite ne...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In today’s world, complex real-life systems are ubiquitous. For example, mobile phone as well as Facebook and Twitter networks facilitate to us the way we interact with one another. Airline and railway networks provide us with the most efficient modes of transportation while also highly reducing travel times. The energy and electric power networks play a significant role in supplying our domestic and industrial lives. Most of these systems frequently feature two types of data with complex substructures and can thus be represented as two-mode networks (also known as bipartite graphs or affiliation networks). Due to the complex structure of such networks, the spread of information across the network makes some nodes more important than others in certain contexts. As such, the interesting question of how to measure the relative importance of nodes in a two-mode network is often increasingly challenging in the field of complex network analysis (CNA). As it is frequently used to understand the role of nodes within a network, node centrality analysis can provide efficient answers to this question. The centrality measure ranks nodes based on how they influence or are effected by other nodes via their connection topology. Since no consensus holds on a unique definition of centrality for two-mode networks, while opening the door for the invention of new ones, various centrality measures have been proposed in the CNA literature (cf. (jackson2010social; jalili2015centiserver; oldham2019consistency)

for a detailed survey), each of which takes into account a distinct aspect of a central node. In the mainstream CNA research area, the bipartite centrality is frequently classified as local or global.

The local centrality metrics focus on the relative importance of the node in its neighbourhood within local cohesive communities. For example, the degree centrality (borgatti1997network) is a basic local metric that counts the number of links that each node has. However, it frequently captures irrelevant local information about a node in practice. Intuitively, it is assumed that only the node with the highest degree should be in the centre (because it is the most densely linked node i.e., a hub), but it does not account for the cascade effects of its neighbour nodes. Hence, it is sometimes necessary to remove nodes with high degree values because they provide no information. For example, Angelina Jolie has a high degree centrality in Facebook’s network because so many people follow her; however, if you explore your friends’ Facebook pages to find out what they are interested in or who among them enjoy soccer the most, Angelina Jolie becomes completely irrelevant in that network.

The k-shell centrality (kitsak2010identification) is a community-based local centrality that enhances the degree of a node in terms of its neighbourhood connections using the k-core111A k-core of a graph is a maximal connected subgraph of such that all nodes have at least neighbours.. Thus, the higher the portions of k-cores contain a node, the more likely it is to be a hub in the cores of a network, and thus the more important it is in a network. However, k-shell frequently produces inaccurate results when the network structure has a small number of k-cores, which is prevalent in two-mode networks

. This is due to the fact that in this case, many nodes are assigned an approximately equal number of k-cores. From the perspective of the topological graph of a two-mode network, k-bicliques may be more accurate graphical components than k-cores. That is, the number of k-bicliques among a node’s neighbours is counted in order to estimate its importance using the Cross k-bicliques connectivity measure, which quantifies how the node affects information propagation through the network. However, in general,

its calculation requires an exponential time and space complexity and is often sensitive to the parameter. To compute Cross k-bicliques connectivity for a given node, we must first extract all k-bicliques from the network containing this node, which is an NP-hard problem (chiba1985arboricity). Furthermore, the determination of the optimal value of may be problematic in many applications. Strictly speaking, picking a large value may result in the overstepping of all k-bicliques with less than the chosen one, leadingto an underestimation of the influence of other nodes in local cohesive communities within the network. A small value may stimulate overestimation of the importance of other neighbour nodes, generating a behaviour similar to degree centrality.

Bipartite Closeness (borgatti1997network; borgatti2011analyzing) is a common type of local centrality that is based on the geodesics. It computes the reciprocal of the sum of the distances between the node and all of the other nodes in the network. Its basic form intuitively assumes that information can efficiently flow from one node to every other node via the shortest distances. The important node is therefore the independent one that is close to other nodes in the network in terms of shortest paths. Thus, at a high level, it can address the degree centrality limitation in a few cases. However, on non-spatial networks, bipartite closeness frequently produces inaccurate results (rodrigues2019network), and its values on spatial networks tend to span a rather small dynamic range from smallest to largest. This is because most complex real-world networks may have a high average length of the shortest path as their largest distance increases exponentially in terms of the number of nodes. That is, assuming that the minimum distance is equal to one, the asymptotic ratio between the minimum and the largest distances is . This frequently implies that numerous nodes, with diverse roles in the network’s information flow, may have comparable closeness scores. On the contrary, most non-spatial networks feature low geodesic distances among nodes given that high geodesic distances increase logarithmically with their network size. As a result, the dynamic range of variations, as well as the network diameter, will be too small, and even slight changes in the network structure can have a significant impact on nodal closeness values.

The bipartite Betweenness (brandes2001faster; borgatti1997network; borgatti2011analyzing) is another common geodesic-based measure. To evaluate the importance of a node, it computes the number of times it exists in the bridge along the geodesic paths among the other nodes in the network. Thus, it considers other nodes’ dependence on a given node, and measures its optimal flow control on information passing among nodes, whether Closeness perceives the connection efficiency or independence from potential flow control through the use of intermediary nodes (cf. (brandes2016maintaining), a detailed study differentiating between closeness and betweenness). In general, bipartite betweenness does not consider node connectivity and its calculation is frequently time-consuming

. The fundamental assumption of betweenness is that every pair of nodes exchanges information through shortest-paths with equal probability. However, this is, in many situations, not a realistic assumption since information does not necessarily take the shortest path

(newman2005measure) (e.g., news related to a friend might not be directly known from another close friend but from other mutual friends). As a result, it does not provide a precise representation of the most influential nodes within these groups, but rather a fair approximation (see (newman2018networks) for a more detailed explanation). Furthermore, its exact centrality computation on large or dense two-mode networks requires a time complexity of , where and are the number of the two types of nodes, respectively.

Looking at local centrality from a different angle, bipartite percolation centrality (piraveenan2013percolation) estimates a node’s relative importance by counting the number of percolated paths that pass through it. The percolated path is the shortest path between two nodes in which the source node is percolated (e.g., infected) but the target node may be not. The percolation centrality fully captures the essential mechanics of contagion-mediated network spreading by associating percolation paths with weight terms that determines how much importance is given to potential percolation paths originating from given nodes. This is indeed helps percolation centrality to avoid the limitation of both betweenness and closeness, which rely solely on topological and random diffusion processes via random shortest-paths. It may, however, produce poor results when the spread of contagion has no effect on changing the node state, and it is frequently computationally expensive to calculate. Because the percolation through a network is affected by both the level of contagion and the network structure (meyers2007contact), the spread of contagion in a complex network (CN) may not change node states in a few scenarios. From a theoretical standpoint, there is a possibility that there is no transmissibility, and in this case, the percolated contagion spreads over the edges of a complex network without changing the state of a node to either recoverable or infected, leaving them in the default state. Moreover, computing the percolation centrality in worst-case scenarios with large bipartite networks having complex local structures requires a cubic time complexity in the two types node numbers.

Global measures, on the other hand, consider a node’s prominence in the context of the entire network. Its principle emphasizes the hypothesis that a few important neighbours can weight more than a large number of unimportant ones. That is, a node is important if it is connected to other important nodes. For example, Bipartite Eigenvector centrality (borgatti1997network; borgatti2011analyzing) quantifies whether a node is central based on its connections to other high-score nodes. It estimates the number of traversals of each node through indefinite-length random walks. Intuitively, this implies that the node in the network core is more accessible than the other nodes. From a conceptual standpoint, a node’s eigenvector can be thought of as the global extension of its local degree centrality, in which both count walks that begin and terminate at that node. Eigenvector may include a localization transition, which frequently results in inaccurate centrality scores. As demonstrated in (martin2014localization)

, eigenvector centrality has a localization transition under the common conditions of a network regime, causing the majority of the weight of the centrality to concentrate on a small number of nodes in the network. This implies that when a network structure contains many hubs, the eigenvector weights are skewed toward some few nodes: the hub node and its neighbours have the highest eigenvector values, while the remaining nodes have identical centrality values (likely close to zero).

In this paper, we present Bi-face (BF), a new bipartite centrality that can be used to identify key nodes in complex networks. While we focus on two-mode networks here, but in tandem with its formulation for one-mode networks which we present in (ibrahim2020cross), its general framework can be easily modified and applied to other representations of CNs, such as multidimensional and multilayer networks (dickison2016multilayer). The guiding idea of BF is to use a formal concept analysis framework to bring together the centrality aspects of cohesiveness via bicliques, network flow via bridges, and influence of important neighbour nodes for the benefit of actionable node identification. Its conceptual hypothesis is based on the fact that important nodes should be found in influential bridges and overlapping bicliques with a large number of important nodes. That is, it quantifies how a node affects, and is affected by, its important neighbours via bicliques while also connecting the densely substructures of a network through its presence in influential bridges. Thus, it differs from betweenness in that it deems influential bridges rather than all bridges. Unlike closeness and eigenvector, it can efficiently deal with the diverse topological structures of a network, without potentially having localization transition, due to this hybridization of the influential bridges and overlapping bicliques aspects. Furthermore, it leverages the powerful mathematical formulation of Formal Concept Analysis (FCA) to overcome the limitation of Cross k-bicliques connectivity. That is to say, it utilizes concept lattice related to the network to efficiently extract concepts that capture bridges and k-bicliques from the network while being insensitive to the parameter. Technically, CF2 computation is based solely on the set of these extracted concepts, which is often quite small in comparison with polynomial functions in terms of nodes and edges. As a result, in contrast to percolation, it is relatively quick to compute in practice.

The paper is organized in the following manner. Section 2 recalls some basic definitions of FCA and traditional bipartite centrality measures. Section 3 explains our proposed Bi-face centrality for identifying key nodes of two-mode networks in further more detail. In Section 4 we conduct a thorough experimental study and a discussion. Finally, Section 5 presents our conclusions.

2 Background

This section will briefly review the main concepts that support the comprehension of our proposed centrality measure by using an illustrative example, which is a two-mode network of airline alliances and their flying destinations in the year . As shown in Figure 1, the network is modeled as an undirected bipartite graph , where is a set of objects (also called type-I nodes) representing airline companies, is a set of attributes (type-II nodes) representing flying destinations, and is a set of edges where an edge links two nodes and , if a flight from airline company landed at the destination .

Figure 1: A two-mode graph network representing flights from airline companies (in red) landing at destinations (in green) in Year 2000.

2.1 Formal Concept Analysis

In the following we recall notions of FCA (Ganter+1999) that will be used in this paper.

Definition 2.1 (Formal context).

It is a triple , where is a set of objects, a set of attributes, and a binary relation between and with . For and holds (i.e., ) iff the object has the attribute , and otherwise (i.e., ).

Figure 2 is the formal context equivalent to an adjacency matrix that expresses the two-mode network shown in Figure 1.

Given arbitrary subsets and , the following derivation operators are defined:

where is the set of attributes common to all objects of and is the set of objects sharing all attributes from . The closure operator implies the double application of , which is extensive, idempotent and monotone. The subsets and are closed when , and .

Definition 2.2 (Formal concept).

The pair is called a formal concept of with extent and intent if both and are closed and , and .

The object concept is expressed by and the attribute concept of is defined by .

Definition 2.3 (Partial order relation ).

A concept if:

(1)

In this case, is called a superconcept (or successor) of , and is called a subconcept (or predecessor) of . The set of all concepts of the formal context is expressed by or simply .

Definition 2.4 (Concept Lattice).

The concept lattice of a formal context , denoted by , is a Hasse diagram that represents all formal concepts together with the partial order that holds between them. In , each node represents a concept with its extent and intent while the edges represent the partial order between concepts.

Figure 3 is the Hasse diagram of the concept lattice that corresponds to the context of Figure 2. More precisely, it is a diagram with reduced labeling. This means that the label is written below and above . The extent of a concept represented by a node is given by all labels in from the node downwards, and the intent by all labels in from upwards.

There are several methods (cf. (Ganter+1999; valtchev2002partition; choi2009faster)) that build the lattice, i.e., compute all the concepts together with the partial order.

Definition 2.5 (Lower and Upper covers).

For any two formal concepts if:

(2)

or

(3)

then is a lower cover of , and is an upper cover of ; represented as and respectively.

We will use and to denote the sets of upper and lower covers of the formal concept respectively.

Definition 2.6 (Concept Intentional Face (pfaltz2002scientific)).

The intentional face of a concept w.r.t. its d-th upper cover concept, , is the difference between their intent sets as: .

Definition 2.7 (Concept Extensional Face).

The extensional face of a concept w.r.t. its l-th lower cover concept, , is the difference between their extent sets as: .

Definition 2.8 (Blocker (pfaltz2002scientific)).

Given the family of faces , the set is said to be a blocker of if , and the blocker is said to be minimal if .

Definition 2.9 (Generator (bastide2000mining)).

Given a concept in a formal context , a subset is called a generator of iff , and it is a minimal generator when such that . We use and to denote the sets of minimal generators of a concept w.r.t. its extent and intent respectively.

2.2 Social Network Analysis

Definition 2.10 (Biclique).

Let be an undirected bipartite graph defined over the objects and attributes . A biclique is a complete subgraph of induced by a pair of two disjoint subsets , such that , , , , .

The disjoint subsets ,
is an example of a biclique. In the sequel, we use as our illustrative biclique (see the lattice node indicated by a red arrow in Figure 3) to support the understanding of definitions and principles related to the Bi-face centrality.

Definition 2.11 (Bridge).

An edge of a two-mode data network is a bridge iff it is not contained in any cycle and its removal increases the number of connected components in the graph .

For instance, the edge represents a bridge in .

Definition 2.12 (Bipartite centrality measure).

The centrality measure of a type-I node is a function that assigns a positive real number to quantifying its centrality w.r.t. to all other type-II nodes in the network (and vice versa).

The bipartite (also called two-mode) centrality measures are frequently used to identify and rank key nodes in two-mode networks. While several centrality measures have been introduced, the degree, closeness, betweenness and eigenvector have been found to be the most prominent in several applications, and they thereby are commonly used.

Definition 2.13 (Degree centrality (borgatti1997network; tsugawa2015analysis)).

The degree centrality of a node in a two-mode graph network , is defined as:

(4)
(5)

where is equal to when a link exists between and , and otherwise. Thus, the summation in Eq. (4) represents the number of edges (or ties with other type neighbour nodes) involving the node.

Definition 2.14 (Closeness centrality (borgatti1997network; borgatti2011analyzing)).

The normalized closeness centrality of a node , in a two-mode graph network , is defined as:

(6)
(7)

where is the geodesic distance (shortest path) between the nodes and .

Definition 2.15 (Betweenness centrality (brandes2001faster)).

In bipartite networks , the normalized betweenness centrality of a node is defined as in (borgatti2011analyzing):

(8)
(9)

where denotes the total number of shortest paths between nodes and , and is the number of those paths that traverse . To normalize the betweenness, we simply divide and by the corresponding term to its node set (borgatti2011analyzing):

(10)

where and ,

(11)

where and

Definition 2.16 (Eigenvector centrality (borgatti1997network; borgatti2011analyzing)).

The eigenvector centrality of a node , in a graph network , can be iteratively computed as:

(12)
(13)

where the eigenvalue

is a constant, and is the adjacency element which is equal to if node is linked to node , and otherwise.

3 Bi-face Framework

At a conceptual level, our overall Bi-face centrality approach contains the following basic steps.

  1. We construct the formal context associated with the network and then its corresponding concept lattice. The set of bicliques coincide with the set of formal concepts whose extent or intent is not empty.

  2. We refine the bicliques by removing non-influential nodes in order to obtain face bicliques (see Definition 3.2).

  3. We identify what we call face bridges, which are the non-influential bridges in the network that contain terminal nodes

  4. We compute the Bi-face centrality measure to identify key nodes.

3.1 Building the Formal context of a Two-mode Network

We first build the formal context of the two-mode network by computing the adjacency matrix as follows:

(14)

In Eq. (14), we assign to the element of in the row and column if the object (node type-I) is linked to the attribute (node type-II) in the network . Otherwise, we assign to it. For example, the constructed formal context of our toy graph in Figure 1 is represented in the table of Figure 2.

Figure 2: The formal context for the two-mode network of Figure 1.

We then construct the concept lattice from the formal context, as it is shown in Figure 3. Note that Figure 3 shows the Hasse diagram of with reduced labelling, where the label is written below and above . The extent of a concept represented by a node is given by all labels in from the node downwards, and the intent by all labels in from upwards.

Figure 3: The Hasse diagram of the concept lattice that corresponds to the context of the two-mode network in Figure 1. More precisely, it is a diagram with reduced labeling. This means that the label is written below and above . The extent of a concept represented by a node is given by all labels in from the node downwards, and the intent by all labels in from upwards. The red downward arrow indicates the illustrative biclique cited after Definition 2.10.

3.2 Overlapping Biclique Extraction and Refinement

Using the constructed lattice , it is now possible to extract concepts that capture the corresponding bicliques of the two-mode network as follows:

proposition 3.1.

Given a network and its corresponding concept lattice , a concept with and , represents a biclique in .

Proof.

A concept represents a unit rectangular matrix of size - as a sub-matrix of the adjacency matrix - and hence a biclique since it is a maximal rectangular in the formal context. Assume now that is a biclique of . Then, from Definition 2.10, for any two different nodes , there exists an edge in that links the two nodes. Based on Eq. (14), the obtained adjacency matrix that expresses the biclique obviously represents a sub-matrix consisting of all 1’s. Such a sub-matrix coincides with the concept in which both extent and intent involve only the objects and attribute nodes of respectively. This entails that a biclique is identical to a concept . ∎

An interesting question that could be raised now is how to determine the non-influential nodes in a given concept (or biclique). To answer this question, let us define a non-influential node from the viewpoint of FCA.

Definition 3.1 (Non-influential node).

For a formal concept (biclique) , a type-I node is non-influential if its removal from (and accordingly from the graph ) does not violate the closure conditions of other biclique concepts that involve it:

(15)

In a dual manner, a type-II node is non-influential if :

(16)

That is, the subset of concepts (or bicliques) that contain either node or node still maintain their local conceptual structures even after removing from their extents or from their intents. Intuitively, this means that the node or is not important since taking it off from the graph does not affect the essential connectivity of the network (e.g., the collapsing of other concepts). In fact, Definition 3.1 raises another interesting question of how to determine the non-influential nodes in a given biclique. Fortunately, the faces of its corresponding concept, w.r.t. its upper and lower covers, can provide information as to what its non-influential nodes would be. Thus, an effective strategy here to answer this question is to contrast the corresponding concept (biclique) with its lower and upper covers through extensional and intentional faces to identify its potential non-influential type-I and type-II nodes respectively. That is, the set of faces of its concept , w.r.t. its lower and upper covers, share the non-influential (type-I and type-II) nodes in its (extent and intent) respectively:

(17)
(18)

For instance, the corresponding concept of has two extensional faces and respectively. The intersection of these two faces is empty, which means that there is no non-influential type-I nodes in the . It also has only one intensional face . Thus, the intersection is also , which entails that is a non-influential type-II node in the .

On the basis of Equations (17) and (18), we can leverage the faces of concepts to define a key biclique222Note that a biclique is key when all of its nodes are influential. as follows:

Definition 3.2 (Face Biclique).

Given a two-mode network and its corresponding concept lattice , a concept (representing a biclique) , is called a face biclique if all of its (type-I and II) nodes are influential, i.e., no one of them satisfies the conditions in Equations (17) and (18).

Based on Definition 3.2, we can obtain the face biclique by refining the original biclique as follows:

(19)

In Equation (19), we remove non-influential type-I nodes from its extent and non-influential type-II nodes from its intent. It is worth noting that when the extent or intent contains only one node, no refinement is applied because this node is influential by default. This is due to the fact that removing this node clearly violates the closure conditions in Equations (17) and (18).

3.3 Face-Bridge Detection

Definition 3.3 (Face-I Bridge and Terminal type-I node).

Given a 2-mode network and its corresponding concept lattice , an edge represents a non-influential (face-I) bridge containing a terminal (type-I) node when there is an attribute concept with that satisfies the following:

(20)

For instance, the attribute concept

that appears in blue/black in Figure 3 has an extensional minimal generator set . This implies that BritishMidland (in yellow in Figure 3) is a terminal (type-I) node and the edge represents a non-influential (face-I) bridge. Similarly, we have:

Definition 3.4 (Face-II Bridge and Terminal type-II node).

Given
a 2-mode network and its corresponding concept lattice , an edge represents a non-influential (face-II) bridge containing a terminal type-II node when there is an object concept with that satisfies the following:

(21)

Input: Concept intent , Set of upper covers .

Output: Set of minimal generators .

1:;
2:for each in  do
3:   ;
4:   if  then
5:      ;
6:   else
7:      ;
8:      for each in  do
9:         if  then
10:            );
11:         else
12:            );
13:         end if
14:      end for
15:      ;
16:   end if
17:end for
18:;
Algorithm 1 Minigen() procedure for computing the intentional minimal generators of a concept intent.

The question now is, how can we obtain the minimal generators of object and attribute concepts? We can efficiently compute the set of minimal generators of a concept intent by applying Minigen() procedure, which is given in Algorithm 1. It iteratively calculates the face of w.r.t. each upper cover in (Line 3). If the set of intentional minimal generators is empty, it then assigns the individual attributes in the first face to (Lines 4-5). Otherwise, it progressively checks the intersection between the calculated face and each generator in (Line 8). If the intersection with the current generator is empty, then is not in the family blocker formed by the face (Line 9). This entails that the generator must then be modified so that it belongs to the minimal blocker family of faces. Thus, the new minimal generators will be obtained by adding each element of the current face to (Line 10). If the intersection is not empty, then the current generator , which exists in the family of minimal blockers of previous faces, is also a minimal blocker of the family formed of the the current face . So, we add the generator , without performing any modification to the minimal generator set (Line 12). It ultimately verifies the minimality of the obtained set (Line 15) and returns the final set of minimal generators (Line 18). Note that, in a dual way and using the set of concept’s lower-covers , we can apply Minigen() procedure to compute the set of extensional minimal generators of a concept w.r.t. its extent .

3.4 Bi-face Centrality

Definition 3.5 (Bi-face Centrality ).

The Bi-face centrality of nodes and of , in a given graph network , can be computed as:

(22)
(23)

stands for the set of face bicliques while and represent the two sets of non-influential (face-I) and (face-II) bridges, respectively. In Eq. 22, the Bi-face centrality computes the sum of face-biclique333Note that the face-clique of a node is the number of overlapping face bicliques to which it belongs to. and Face-bridge terms. The numerator of the face-biclique of the first term simply counts the number of refined concepts, with extent and intent sizes greater than 1, that involve a type-I node . Thus, it quantifies the portion of face bicliques, in the graph network , which the node belongs to. From a conceptual perspective, this term can be considered as an efficient way of computing the cross connectivity (faghani2013study; everett1998analyzing) of the node using refined overlapped bicliques that only contain influential nodes. In the face-bridge term, we first quantify the ratio of the face bridges that involve the node . This ratio is then subtracted from to approximate the portion of influential bridges in the graph that contain the node . Note that the numerators of both face biclique and Face-bridge terms are unnormalized quantities. Thus, the denominators in Eq. 22 serve as normalization constants to scale the two terms between and . In a similar manner, the Bi-face centrality in Eq. (23) can be interpreted and used to compute the centrality of type-II nodes in the graph.

Algorithm 2 gives the pseudo-code for computing the Bi-face centrality of all type-I nodes in the two-mode network . The algorithm takes as input the set of all extracted concepts . For each type-I node , it first iteratively refines the extents of the bicliques to obtain the face ones by removing all their non-influential type-I nodes (lines 4-5). It then counts the number of those refined face bicliques in the graph that involve (lines 7-9). Hereafter, it iteratively calculates the minimal generators of the the attribute concepts w.r.t. their extents to identify the face-bridges that involve the node (lines 11-12). It then counts the number of those face-bridges that involve the node as a terminal (type-I) one (lines 13-15). Subsequently, it computes the Bi-face centrality of a node (lines 19-21). Finally, it returns a list containing the Bi-face centrality measures of all type-I nodes in the graph respectively (line 22). Without loss of generality, and in a dual manner, algorithm 2 can be applied to compute the Bi-face centrality for each type-II node as follows. It iteratively obtains the face bicliques by refining the non-influential type-II nodes from the intents of their corresponding concepts. It then identifies the face bicliques in the graph that involve . It then uses the minimal generators of object concepts to count the number of the face-bridges that involve the node as a terminal (type-II) one. Finally, it returns a list containing the Bi-face centrality measures of all type-II nodes in the graph.

Input: Set of bicliques ().

Output: Bi-face centrality () of all type-I nodes.

1:;
2:for each  do
3:   ;
4:   for  do
5:      
6:      // Counting face bicliques that contain the node
7:      if  and  then
8:         ;
9:      end if
10:// Counting face-bridges that contain the node
11:      if  then
12:         
13:         if  then
14:            ;
15:         end if
16:      end if
17:   end for
18:end for
19:for each to  do
20:   ;
21:end for
22:Return
Algorithm 2 Computing Bi-face centrality () for all type-I nodes in a two-mode network.
Complexity Analysis

The calculation of the face biclique term has time and space complexity equal to since we store and proceed through the extent of all the bicliques to count the face bicliques that contain the node. The Face-bridge term of type-I node needs iterating through the attribute concepts and calculates their minimal generators w.r.t. their corresponding lower covers. Thus, the Bi-face centrality of all type-I nodes requires , where is the set of attribute concepts, is the largest size of an obtained set of minimal generators for attribute concepts, and is the largest number of lower covers for an attribute concept. Now, since we often have and also , then the first term frequently dominates the second one. This entails that computing the Bi-face centrality of all type-I nodes needs a time and space complexity of . In a dual way, the calculation of the Bi-face centrality of all type-II nodes has a time complexity of . In total, the Bi-face centrality has time and space complexity of .

4 Experimental Evaluation

The goal of our experimental evaluation is to investigate the following key questions.

  • (Q1) Is the Bi-face centrality more accurate than the state-of-the-art centrality measures?

  • (Q2) Is Bi-face centrality performing fast compared to prominent centrality measures?

To find robust answers, we first select the following four (real-life and synthetic) two-mode networks which have different configurations, and they thereby facilitate the validation of various scenarios.

4.1 Datasets

  • Norwegian Interlocking Directorates (seierstad2011few), which contains interlocking boards of Norwegian director women in Norwegian public limited companies. A link represents a board membership connecting a woman as a director of a public company in Norway on August 2009.

  • PediaLanguages(morsey2012dbpedia) involves the semantic web of official languages spoken by people living in different countries. An edge connects an official language to a country if people in that country speak that language.

  • Southern-Women-Davis (borgatti20092; Freeman2003), which is a two-mode social network of women reporting their participation in events (such as a meeting of a social club, a church event and a party) over a nine-month period. A woman is connected to an event if she attends that event

  • CoinToss, which is a random bipartite network generated by indirect Coin-Toss model generator (felde2020null).

A few statistics of the networks is summarized in Table 1.444Datasets are publicly available at: https://toreopsahl.com/datasets/
http://konect.cc/networks/opsahl-collaboration/
https://networkdata.ics.uci.edu/netdata/html/davis.html

Name
Norwegian-Directorate
PediaLanguages
Southern-Women-Davis
CoinToss
Table 1: A brief statistics of the social networks, which includes the number of type-I nodes, the number of type-II nodes, the number of edges, and the density in .

4.2 Methodology

Subsequently, we compared the results of our proposed Bi-face centrality with the following measures:

  • Bipartite closeness [Definition 2.14], a prominent diameter-based centrality

  • Bipartite Betweenness [Definition 2.15], a state-of-the-art geodesics-based centrality

  • Bipartite Eigenvector[Definition 2.13], a state-of-the-art centrality that assesses the importance of a node based on its connections to other highly influential nodes in a network.

  • Vote-Rank (zhang2016identifying), which is a well-known method for identifying decentralized spreaders. It calculates the ranking of the nodes in the bipartite graph based on a voting scheme. That is, at each turn, all nodes iteratively vote in a spreader. The node with the highest votes is elected iteratively, while decreasing the voting ability of the elected spreader’ neighbours in the the next turn.

  • Percolation (piraveenan2013percolation), which measures the proportion of percolated paths555We recall that the percolated path is the shortest one between two nodes in which the source node is percolated (i.e., infected). that go through a given node. So, it quantifies the relative impact of nodes in various percolation scenarios based on their topological connectivity over time. The percolation state is commonly assigned a value between and , with being the most common that we used in our experiment.

  • Bipartite Degree [Definition 2.13], which can serve as a good baseline for comparison.

To evaluate the lists of (type-I and type-II) nodes ranked by all the centrality measures, we need to compare them with the corresponding ranked lists that are obtained by the real spreading process of the nodes. Thus, we applied the following traditional schema on each individual type of nodes (chen2012identifying; zhao2019modeling; zhao2020ranking) to validate the performance of a tested centrality measure:

  1. Compute the centrality measure for all nodes, and then record the node ranking list

  2. Use SIR model (chen2012identifying) to simulate the spreading ability of the nodes. In the SIR model, every node belongs to one of three states: susceptible, infected, or recovered. At each step, we set only one node to be infected, the other nodes are susceptible nodes, and then investigate the information spreads in the network. Every infected node can infect its susceptible neighbours with spreading (also called infection) probability. Note that instead of considering the recovered state of each node, we focus on the influence within a time since the spreading in an early stage is found to be more important in practice. At the end of the SIR simulation process, we calculate the spreading efficiency for every node, and then record the node influence ranked list

  3. Based on the centrality-based ranking list and the one generated by the SIR model, we record the joint score list , where and are the centrality-based and SIR-based measures of a node , respectively. For any two randomly selected pairs , if both and or if both and , they are said to be concordant. If both and or if both and , they are said to be discordant. If and , then the pair is neither concordant nor discordant.

Consequently, we calculate the following Kendall’s tau rank correlation coefficient metric:

(24)

where and are the number of concordant and discordant pairs in , respectively. A high value indicates that the centrality measure could produce an accurate ranked list. The ideal case is when where the ranked list generated by the centrality measure is symmetrical to the ranked list generated by the real spreading process. To evaluate the accuracy of the results, we now calculate the average Kendall’s tau rank correlation coefficient as follows:

(25)

where and are the Kendall’s tau correlation coefficients calculated using Eq. (24) for type-I and type-II of nodes, respectively.

To assess the scalability, we consider the average elapsed time metric as:

(26)

where and are the elapsed times for computing the underlying centrality measure of a type-I node and a type-II one , respectively.

All the experiments were run on an Intel(R) Core-i7 CPU @2.6GHz computer with 16 GB of memory under MacOS Mojave. We implemented all the considered indices as an extension to NetworkX Python package. To extract formal concepts we make use of the Concepts 0.7.11 Python package, which is implemented by Sebastian Bank666Publicly available: https://pypi.python.org/pypi/concepts.

4.3 Results

4.3.1 Experiment I.

This experiment is devoted to answering Question 1. Each infected node has a spreading probability of infecting its susceptible neighbours in the SIR model simulation. As a result, and in accordance with the scheme described above, we iteratively increase the spreading probability in the range with increments of . At each step-size, we compute the joint list

of each centrality measure and the real spreading of the nodes for each individual type of nodes separately. We then calculate the corresponding evaluation metric

in Eq. (25).

Figure 4 displays the average Kendall’s tau correlation coefficient between the seven tested centrality measures and the ranking list generated by the SIR model, with a spreading probability and at a given time . Overall, Bi-face outperforms all the compared centrality measures, achieving the most accurate Kendall coefficient on Norwegian-Directorate, PediaLanguages and CoinToss networks. On the Women-Davis network, Bi-face has the highest value when the spreading probability , otherwise vote-rank, closeness, betweenness and degree slightly compete with Bi-face. The percolation comes close behind Bi-face on Women-Davis, but considerably further behind on Norwegian-Directorate, PediaLanguages and CoinToss networks. Except on the Women-Davis network with spreading probability , the vote-rank is clearly less accurate than Bi-face on all the tested networks, but it is more accurate than percolation, betweenness, closeness, eigenvector and degree on PediaLanguages and CoinToss networks. On the Norwegian-Directorate and Women-Davis networks, the vote-rank and percolation compete with each other. The percolation is clearly more accurate than betweenness and eigenvector when the spreading probability on all the tested networks. Both betweenness and eigenvector dominate degree and closeness on Norwegian-Directorate, PediaLanguages and CoinToss networks. The betweenness is more accurate than eigenvector on PediaLanguages network when the spreading probability , but it is outperformed by eigenvector on CoinToss network, and both of them compete each other on Women-Davis and Norwegian-Directorate networks.

Figure 4: The average Kendall’s tau coefficient between the tested centrality measures and the ranking list generated by the SIR model, with , at on the four underlying datasets.

4.3.2 Experiment II.

The second experiment is dedicated to answer Question 2. The goal here is to evaluate the performance of the centrality measures. That is, we rerun Experiment I while reporting their computational time as in Eq. 26. The average elapsed time of the seven centrality measures on the four underlying networks is depicted in Figure 5. On all the tested networks, the Bi-face dominates all centrality measures (except degree). It finishes at least twenty-three times faster than betweenness, eleven times faster than percolation, nine times faster than eigenvector and eight times faster than closeness. Degree is very competitive with Bi-face on Women-Davis and CoinToss, but Bi-face clearly prevailed over the degree by a significant margin on Norwegian-Directorate and PediaLanguages networks. Apart from Bi-face, the percolation is marginally faster than both the closeness and vote-rank by at least factors of and on all networks respectively. In addition, the closeness is considerably faster than betweenness, and competes with eigenvector on Norwegian-Directorate and CoinToss networks. Vote-rank is significantly faster than closeness on Norwegian-Directorate, PediaLanguages and CoinToss networks, but on the contrary, closeness is slightly quicker than it on Women-Davis network.

Figure 5: Average elapsed time (in secs) of the seven tested centrality measures: Bi-face, closeness, betweenness, degree, eigenvector, percolation and vote-rank on the four underlying datasets.

4.4 Discussion

Taking the identification of accurate node centrality into consideration, the results of Experiment I in Subsection 4.3.1 indicate that Bi-face outperforms traditional bipartite centrality measures such as vote-rank, percolation, degree, closeness, betweenness, and eigenvector. This is attributed to the use of its face biclique and face-bridge terms in tandem to leverage local and global aspects of network topology, respectively. That is, the face-biclique term quantifies the structural embeddedness of cohesive regions in a network involving each individual (type-I and type-II) node. From a conceptual perspective, this term considers the local information on how the node influences its immediate important neighbour nodes through the lens of its overlapping face bicliques. The face-bridge term quantifies a node’s global role based on how the information flows through influential (face) bridges (i.e., important geodesics).

In terms of effective performance, the results of Experiment II from the previous Subsection 4.3.2, suggest that the Bi-face is considerably faster than all other tested bipartite centrality measures (except degree). In practice, this is because Bi-face primarily calculates the centrality of all nodes based on the set of concepts , which is frequently too small in comparison to all other tested centrality measures with polynomial time complexity in terms of nodes and edges, i.e., and , with .

Besides that, several well-known observations are clearly consistent with the obtained results in Subsection 4.3. First, in some real-world applications, we may end up with several nodes having approximately equal low or high degrees, and in these cases, degree centrality cannot serve as a descriptive measure that can distinguish between nodes. Second, closeness can address the degree centrality limitation in a few situations. For example, consider node that is linked to node . Assume that node is in close proximity to the other nodes in the network, resulting in a high closeness score. Node has a very low degree score of , but a rationally high closeness score, because node can propagate information to all other nodes that node reaches with one extra step. However, closeness, like degree, is usually inappropriate for irregularly connected bipartite networks. Because the shortest-path distance between two nodes is infinite when they are not reachable through a path, the closeness score is equal (or very close) to zero for those nodes in the network that do not reach all other nodes. Third, since betweenness lacks any form of measuring local nodal connectivity, it is expected to produce relevant results only if the goal is only to quantify influence on communication among local groups, which is not always the case when studying the centrality in real-world networks. Finally, and in practice, using the efficient implementation adopted from the fastest algorithm proposed in (brandes2001faster), the calculation of percolation centrality for all nodes requires a time complexity of , which still seems to impose a computational bottleneck even with fairly medium-sized networks.

5 Conclusion

The detection of influential nodes in a two-mode network is frequently an important task in scientific and industrial data analysis pipelines for explaining various behaviours and outcomes. Our work here addressed an obvious gap in the present CNA literature, namely the efficient identification of key nodes by combining both local cohesiveness and global network flow aspects of centrality through the use of FCA mathematical formalization. On this basis, we devised Bi-face, a new bipartite centrality measure that quantifies the prominence of a node in a two-mode network based on its presence in influential overlapping bicliques and bridges. While we focused on two-mode networks here, the approach can easily be modified to accommodate other complex network representations like multilayer networks.

From a conceptual perspective, the Bi-face score is a distinct centrality in the following three elements: (i) it uses the concept lattice formulation to efficiently extract overlapping bicliques and bridges, (ii) it leverages concept faces to refine bicliques from non-influential nodes and detect influential bridges, and (iii) it exploits the fact that influential bridges and overlapping bicliques with a large number of important neighbour nodes are likely to contain key central nodes. As a result, it measures how a node affects and is influenced by its important neighbours through refined bicliques, while also linking the network dense substructures via its existence in influential bridges. According to a thorough empirical study on several synthetic and real-life two-mode networks (see Section 4), the Bi-face score can identify key nodes more accurately and efficiently than other state-of-the-art centrality indices such as degree, betweenness, closeness, eigenvector, percolation, and vote-rank.

References