First Stretch then Shrink and Bulk: A Two Phase Approach for Enumeration of Maximal (Δ, γ)Cliques of a Temporal Network

04/09/2020 ∙ by Suman Banerjee, et al. ∙ IIT Kharagpur IIT Gandhinagar 0

A Temporal Network (also known as Link Stream or Time-Varying Graph) is often used to model a time-varying relationship among a group of agents. It is typically represented as a collection of triplets of the form (u,v,t) that denotes the interaction between the agents u and v at time t. For analyzing the contact patterns of the agents forming a temporal network, recently the notion of classical clique of a static graph has been generalized as ΔClique of a Temporal Network. In the same direction, one of our previous studies introduces the notion of (Δ, γ)Clique, which is basically a vertex set, time interval pair, in which every pair of the clique vertices are linked at least γ times in every Δ duration of the time interval. In this paper, we propose a different methodology for enumerating all the maximal (Δ, γ)Cliques of a given temporal network. The proposed methodology is broadly divided into two phases. In the first phase, each temporal link is processed for constructing (Δ, γ)Clique(s) with maximum duration. In the second phase, these initial cliques are expanded by vertex addition to form the maximal cliques. From the experimentation carried out on 5 realworld temporal network datasets, we observe that the proposed methodology enumerates all the maximal (Δ,γ)Cliques efficiently, particularly when the dataset is sparse. As a special case (γ=1), the proposed methodology is also able to enumerate (Δ,1) ≡Δcliques with much less time compared to the existing methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Network (also called graph) is a mathematical object which is used extensively to represent a binary relation among a group of agents. Analyzing such networks for different structural patterns remains an active area of study in different domains including Computational Biology (hulovatyy2015exploring), Social Network Analysis, Computational Epidemiology (masuda2017temporal) and many more. Among many one such structural pattern is the maximally connected subgraphs, which is popularly called as cliques. Finding the maximum cardinality clique in a given network is a well known NP-Complete Problem (garey2002computers). However, in network analysis perspective more general problem is not only just finding the maximum size clique, but also to enumerate all the maximal cliques present in the network. Bron and Kerbosch (bron1973algorithm) first proposed an enumeration algorithm for maximal cliques in the network which forms the foundation of study on this problem. Later, there were advancements for this problem for different types of networks (cheng2012fast; eppstein2011listing; eppstein2013listing1) etc.

Real-world networks from biological to social are time varying, which means that the existence of an edge between any two agents changes with time. Temporal networks (holme2012temporal) (also known as link streams or time varying networks) are the mathematical objects used to formally represent the time varying relationships. For these type of networks, a natural supplement of clique is the temporal clique which consists of two things: a subset of the vertices, and a time interval. In this direction, recently, Virad et al. (viard2015revealing; viard2016computing) put forward the notion of -Clique, where a vertex subset along with a time interval is said to be a -Clique if every vertex pair from that set have at least a single edge in every duration within the time interval. Next, we report the existing studies on clique enumeration on networks.

1.1 Relevant Studies

The problem of maximal clique enumeration is a classic computational problem on network algorithms and has been extensively studied on static networks. akkoyunlu1973enumeration was the first to propose an algorithm for this problem. Later, DBLP:journals/cacm/BronK73 introduced a recursive approach for the maximal clique enumeration problem. These two studies are the foundations on maximal clique enumeration and trigger a huge amount of research due to many practical applications from computational biology to spatial data analytic al2007enumeration; bhowmick2015clustering. Since past two decades several methodologies have been developed for enumerating maximal cliques in different computational paradigms, and different kinds of networks, such as in sparse graphs eppstein2011listing; eppstein2013listing, in large networks cheng2010finding; cheng2011finding; rossi2014fast, in map reduce framework hou2016efficient; xiang2013scalable, in uncertain graphs mukherjee2015mining; mukherjee2016enumeration; zou2010finding, in parallel computing framework chen2016parallelizing; du2006parallel; rossi2015parallel; schmidt2009scalable and many more.

Though there are many existing studies on maximal clique enumeration on static networks, however, the literature on temporal graphs is limited. Viard et al. (viard2015revealing) proposed an enumeration algorithm for maximal -Clique of a temporal network. They did a detailed analysisof contact relationship among a group of students, based on their introduced methodology. Thry were able to show that their analysis draws deeper insights of their communication pattern (viard2015revealing). Later, Himmel et al. himmel2016enumerating proposed a different approach for maximal -Clique enumeration problem. Their methodology is based on the Bron-Kerbosch Algorithm for maximal clique enumeration in static graphs. Their methodology is better in both of the following aspects: theoretically (measured in terms of worst case computational complexity analysis), as well as practically (measured in terms of computational time when the algorithm is implemented with real-world datasets). Recently, Molter et al. molter2019enumerating introduced the notion of isolation in clique enumeration of a time varying graph. They developed fixed parameter enumeration algorithm based on different notion of isolation employing the parameter “degree of isolation”. Recently, Banerjee and Pal DBLP:conf/comad/BanerjeeP19 proposed an enumeration algorithm for maximal -Cliques present in a time varying graph. As far as we know, other than the last one there is no other work available which studies -Cliques.

1.2 Contribution of the Paper

As mentioned previously, a temporal network consists of a set of agents and a time varying relationship. Now, the following questions are essential to understand the contact pattern among them: which subset of agents comes in contact very frequently among each other? Given a time duration how many times they contact with each other? etc. The frequency of communication also adds another dimension of information to their relationship strength. Motivated by such questions, recently the notion of -Clique has been extended to -Cliques, which is basically a vertex subset and time interval pair in which each pair of communicating vertices of the subset has minimum interactions in every duration within the time interval. In this paper, we give a different approach for listing out all the maximal -Cliques that are there in a temporal network. The main contributions of this paper are as follows:

  • In this paper, we propose a different approach, namely, first stretch and then shrink and bulk, for listing out maximal -Cliques that are there in a temporal network.

  • By drawing sequential arguments, we prove the correctness of the proposed methodology.

  • A detailed analysis of the proposed methodology has been done to understand its computational time and space requirement.

  • The proposed methodology has been implemented with five publicly available temporal network datasets to bring out nontrivial insights about contact patterns and compare the efficiency of the proposed methodology with the existing one.

  • Also, a set of experiments have been conducted to show that the proposed methodology of maximal -Clique enumeration can also be efficiently used for enumerating maximal -Clique as well (By putting ).

1.3 Structure of this Article

Remaining portion of this article is arranged in the following way: Section 2 discusses some preliminary concepts regarding temporal network and formally defines the maximal -Clique enumeration problem formally. Section 3 contains the proposed enumeration technique with its detailed analysis, proof of correctness and an illustrative example. Section 4 describes experimental evaluation of the proposed methodology in details. Finally, Section 5 concludes study and gives future directions.

2 Background and Problem Definition

In this section we present some preliminary concepts to understand the problem, that we work out in this paper, and the proposed solution methodology. In a temporal network, its edges are marked with the corresponding occurrence timestamp(s). Formally, it is stated in Definition 1.

Definition 1 (Temporal Network)

holme2013temporal A temporal network is defined as , where is the set of vertices of the network and is the set of edges among them. is the mapping that maps each edge of the graph to its occurrence time stamp(s).

Figure 1: Links of a Temporal Network

Figure 1 shows a temporal graph with vertices and edges, where edges are shown in the time horizon. In temporal network analysis, it is assumed that the network changes its topology in discrete time steps. So, starting at time , if the network is observed in every time difference till , the time instances are . In rest of our study we assume, and . The difference between the beginning and ending time stamp, i.e., is called as the Life Time of the Network. In the temporal network , if there is an edge between two vertices and at time , then it is symbolized as , signifying that there is a contact between and at time . For some if , then we say, that there exists a static edge between and . The frequency of an edge is defined as the number of such that and denoted as , i.e., . If , then we say that . In rest of our study, we work with undirected temporal network, i.e., there is no difference between and .

In a static network, a subset of vertices, where every pair is adjacent is known as a clique. The size of the clique is defined as the number of vertices it contains. A clique is said to be maximal if it is not part of another clique of larger size. The general notion of clique is extended for temporal graphs as -clique, which is vertex subset and time sub-interval pair, such that, in each duration of the sub-interval there exist at least one link between every pair of vertices in the vertex subset. Formally it is stated in Definition 2.

Definition 2 (-Clique)

viard2016computing Given a temporal network and time duration , a -Clique of is a vertex set, time interval pair, i.e., with , and , such that and there is an edge with .

In one of our recent study, we introduced the notion of -clique by extending the concept of -Clique and incorporating an additional parameter as a frequency threshold. This is stated in Definition 3.

Definition 3 (-Clique)

DBLP:conf/comad/BanerjeeP19 Given a temporal network , time duration , and a frequency threshold , a -Clique of is a tuple consisting of vertex subset, and time interval, i.e., where , , and . Here and , there must exist at least number of edges, i.e., and with . Here, denotes the frequency of the static edge .

In a static graph , a maximal clique is formed as , if for each , is not a clique. Now, as the -Clique is defined in the setting of temporal networks, so its maximality depends on two parameters: one is the cardinality and the other one is the time interval. We introduce the maximality conditions for an arbitrary -Clique in Definition 4 considering both the factors.

Definition 4 (Maximal -Clique)

Given a temporal network and a -Clique of , will be maximal if none of the following is true.

  • such that is a -Clique.

  • is a -Clique. This applies only if .

  • is a -Clique. This applies only if .

From Definition 4, it is clear that the first condition addresses the cardinality, whereas the next two are due to time duration. In a static graph, among all of its maximal cliques, one with the highest cardinality is called the maximum clique or largest size clique. However, in case of -Clique, maximum can be both in terms of cardinality or duration. Hence, maximum -Clique of a temporal network can be defined as follows.

Definition 5 (Maximum -Clique)

Given a temporal network , let be the set of all maximal -Cliques of . Now, will be

  • temporally maximum if , .

  • cardinally maximum if , .

In this paper, we study the problem of listing out all the maximal cliques of a given temporal network, which we call as the Maximal -Clique Enumeration Problem defined next.

Definition 6 (Maximal -Clique Enumeration Problem)

Given a temporal network , , and the maximal -Clique Enumeration Problem asks to list out all the maximal -Cliques (as mentioned in Definition 4) present in .

Next, we proceed to describe the proposed enumeration methodology for maximal -Cliques.

3 Proposed Enumeration Technique

As stated earlier, the proposed methodology is broadly divided into two steps and each of them is described in the following two subsections. The broad idea of the proposed enumeration process is as follows: given all the links with time duration of the temporal network, initially, we find out the maximal cliques of cardinality two. We call this phase as the Stretching phase, because all the cliques after this phase are duration wise maximal, as if, we are stretching the cliques across the time horizon. Next, taking these duration wise maximal cliques, we add vertices into the clique without violating the definition of -clique, as if, we are putting vertices into the initialized cliques to make them bulk. Hence, duration of the newly generated cliques are shrinking. Hence, we call the second phase as the Shrink and Bulk Phase.

3.1 Stretching Phase (Initialization)

Algorithm 1 describes the initialization process of the proposed methodology. For a given temporal network , initially, we construct the dictionary with the static edges as the keys and correspondingly, the occurrence time stamps are the values. By the definition of -clique, if the end vertices of an edge is part of a clique, then the edge has to occur atleast times in the link stream. Hence, for each static edge of , if its frequency is at least , it is processed further. The occurrence time stamps of are fed into the list . A temporary list, , is created to store each current processing timestamps from with its previous occurrences, till it has maintained -clique property. Now, the for-loop from Line 8 to 32 computes all the -cliques with maximum duration where is the vertex set. During the processing of , any of the following two cases can happen. In the first case, if the current length of is less than , the difference between the current timestamp from and the first entry of is checked (Line 10). Now, if the difference is less than or equal to , current timestamp is appended in . Otherwise, all the previous timestamps that have occurred within past duration from the current timestamp are added in (Line 14). This process basically checks timestamp backward from each occurrence times of the static edge . In the second case, when the current length of is greater than or equal to , it is checked whether the current processing time from falls within the interval of (last -th occurrence time + 1) to (last -th occurrence time + 1 + ). Now, if it is true, the current timestamp is appended in . It can be easily observed that this appending is done iff the at least consecutive occurrences are within each duration. Otherwise, the clique is added in with the vertex set and time interval (Line 22), where is the ahead timestamp from the first -th entry in and is the on-wards timestamp from the last -th entry in . Next, all the previous timestamps that have occurred within past duration from the current timestamp are added in as before (Line 24). It allows to consider overlapping clique. Now, this may happen when we process the last occurrence from , it is added in . However, no clique can be added by the condition of 9 to 26 if the length of is greater than or equal to . This situation is handled by Line 27 to 31. This process is iterated for each key from the dictionary . Now, we present few lemmas and all together they will help to argue the correctness of the proposed methodology.

Data: The temporal network .
Result: The initial clique set of
1 ;
2 ;
3 for Every  do
4       if  then
5             ;
6             ;
7             ;
8             for  to  do
9                   if  then
10                         if  then
11                               ;
12                              
13                        else
14                               ;
15                              
16                         end if
17                        
18                  else
19                         if  then
20                               ;
21                        else
                               ;
                                // first -th occurrence of in Temp
                               ;
                                // last -th occurrence of in Temp
22                               ;
23                               ;
24                              
25                         end if
26                        
27                   end if
28                  if  then
                         ;
                          // first -th occurrence of in Temp
                         ;
                          // last -th occurrence of in Temp
29                         ;
30                        
31                   end if
32                  
33             end for
34            
35       end if
36      
37 end for
Algorithm 1 Stretching Phase of the -Clique Enumeration
Lemma 1

For a link , if there exist any consecutive occurrences within duration, then it has to be in ‘’ at some stage, in Algorithm 1.

Proof

Follows from the description of Algorithm 1.

Lemma 2

In any arbitrary iteration of the ‘for loop’ at Line 8 in Algorithm 1, each consecutive occurrences of ‘’ will be within duration.

Proof

To prove this statement, we use the method of contradiction. Initially, contains the first occurrence of a link. Now, when the length of is less than (Line 9), next occurrence times are added in (Line 11) if the difference from initial to current occurrence time lies within (Line 10), else the times at which the links have occurred in previous duration from the current time are added (Line 13, 14). This clears that all the entries in are within duration when the length of is less than .

When the length of is greater than or equal to , without the loss of generality, let us take any arbitrary occurrences of as , which is not within duration, i.e., . Let us also assume that from , all the previous occurrences in follow the statement of this lemma. Now, from our assumptions, we have the following conditions:

(1)
(2)
(3)

Now, let us assume the previous occurrence of the link from in is and our goal is to infer the possible positions of in the time horizon. From the definition of -clique, there will be occurrences from to . If first links have occurred in consecutive times then . This is the minimum value for . From Equation 3, the maximum value for is . Hence, . Now, from Equation 2, we have , when and replacing with in Equation 2, we get as . This violates the condition imposed in Line 17. Hence, can not be added in . So, we reach the contradiction and this completes the proof.

Figure 2: Demonstration Diagram for Lemma 3
Lemma 3

Let, and be the first and last occurrence time of a link in . In the interval , contains at least links in each duration.

Proof

When the length of is less than , Line 9 to 15 in Algorithm 1 allows to hold the statement of the lemma by adding consecutive occurrences in duration. So, it is trivial that we need to prove the statement when length of is greater than . Let us assume that the occurrence times of first entries of are , where and .

Now, by Lemma 2, and . Without loss of generality, we want to show that there exist at least links from to . As , the maximum difference between and can be and this case will arise when all the links appear in each consecutive timestamp from towards (shown in Figure 2). Now, as , we have to show . This extreme case will intuitively prove the rest of the cases. So, we can infer the following conclusion from Lemma 2 and the assumption . Now,

Again, from the condition imposed at Line 17 in Algorithm 1, we also have . Now, as per our assumption of extreme case . So, .

Now, as , we can argue , for all . Moreover, from Lemma 2 there is links within , which concludes the existence of at least links from to . Now, for any , there will be atleast links in from to . This completes the proof of the claimed statement.

Lemma 4

In Algorithm 1, the contents of are -Cliques of size .

Proof

We are processing each static edge of the temporal network in its time horizon and add the -clique(s) formed by the end vertices of the edge into . Hence, the cliques in are of size 2. Now, in Algorithm 1, the cliques are added into in Line 22 and 30. In both the cases, cliques are added if the current length of the is greater than or equal to . As per Lemma 3, at least links in each duration. While adding the duration of the clique, is obtained by subtracting duration from first -th occurrence time and is obtained by adding duration from last -th occurrence time in . This ensures the existence of at least occurrences of the link in each duration between to .

Lemma 5

All the cliques returned by Algorithm 1 and contained in are duration wise maximal.

Proof

We prove the duration wise maximality of each clique in by contradiction. Let us assume, a clique is not duration wise maximal. Then, there exists a with such that is a -clique or a with such that is a -clique.

Now, if is a -clique, then its first occurrences will be in at some stage as per Lemma 1. Later, this is expanded till either by Line 11 or 18 in Algorithm 1. Hence, will be added in , instead of . So, the assumption that there exists a with is false.

Now, by Lemma 4, as is a -clique, in each duration within to there will be atleast links between and . Let us assume, that and are the last -th and -th occurrence time of respectively. From the definition of -clique, , hence, . Now, to be a -clique in the interval , there must be atleast one link between and in the interval . If there exists such links, it indicates the presence of or more links in the interval . This case is handled by Algorithm 1 either in Line or and will not be added to . So, there can not exist any which is greater than .

Hence, all the cliques of returned by Algorithm 1 are duration wise maximal.

Lemma 6

All the duration wise maximal -cliques of size 2 are contained in .

Proof

In Lemma 4 and 5, we have already shown that each -clique of is of size , and duration wise maximal, respectively. Hence, in this lemma, we have to prove that none of such cliques are missed out in the final . As each edge is processed independently by Algorithm 1, it is sufficient to prove that all the duration wise maximal -cliques for a particular vertex pair (corresponding to an edge) are contained in .

Let, is a duration wise maximal -clique and not present in . Now, as is a -clique, so there exist at least links in each duration from to , and let and are the first -th and last -th occurrence time of the link , between to . We denote the occurrence timestamps for the static edge as , and . Now, there can be one of the following cases for the values of and .

  1. and : The clique is formed at the beginning of the occurrence stream of . According to Lemma 1, all the occurrence time will be in . Now, if , it will be added in by Line 30 of Algorithm 1. Otherwise, and . Hence, it breaks the if condition at Line 17, and the clique will be added in by Line 22.

  2. and : The clique is formed at the end of the occurrence stream of . If , it follows from the above case. For the else part, we need to show that is handled by the Algorithm 1. Here, and . Along with Lemma 1 and 2, the Line 14 and 24 are responsible to have all the timestamps within must be . So, the clique will be added in by Line 30.

  3. and : The clique is formed in the middle of the occurrence stream of . Both the scenarios of and values are shown in the above two cases, so the clique will be added in by Line 22.

Lemma 7

Running time of finding all the duration wise maximal -cliques of size in Algorithm 1 is of .

Proof

Preparing the dictionary at Line 1 in Algorithm 1 will take . Assuming the frequency of each static edge is atleast , we evaluate the running time for processing a static edge. It will be identical for rest of the edges. During the processing, all the operations from Line 8 to 32 take times except, the appending at Line 14 and 24. Now, the appending of previous occurrences within past duration can leads to copying of at most previous entries in , which takes times. Now, the worst case may occur when in every iteration of the for loop at Line 8, previous occurrences are copied in (at Line 24) and this case may occur at most times. In this case, the running time of the for loop from Line 8 to 32 is for a particular static edge. Now, for all the static edges the for loop at Line 3 will run with times. Now, the total running time of Algorithm 1 is . Here, summing up all the frequencies of the static edges gives the total number of links of the temporal network, i.e., . So, the time complexity of the initialization is of .

We have provided a weak upper bound on running time of the initialization process (Algorithm 1) in Lemma 7. Now, we focus on space requirement of Algorithm 1. Storing the Dictionary in Line Number requires space. In the worst case, space requirement by the list is of . The size of can go upto the maximum number of times that any static edge has occurred consecutively more than gamma times in each delta duration, and in the worst case it may take space. As all the initial cliques are of size , hence space requirement due to is of , where is the highest frequency of the initial cliques. So, total space requirement by Algorithm 1 is of . Hence, Lemma 8 holds.

Lemma 8

The space requirement of Algorithm 1 is of .

Now for the temporal network shown in Figure 1, the initial cliques with and , in are , , , , , , , , , , .

3.2 Shrink and Bulk Phase (Enumeration)

Algorithm 2 describes the enumeration strategy of our proposed methodology. For the given temporal network , we construct a static graph where is the vertex set of and each link of induces the corresponding edge in without the time component, which we call as a static edge. Next, the dictionary is built from the initial clique set of Algorithm 1, where the vertex set of the clique is the key and corresponding occurrence time intervals are the values. This data structure is also updated in the intermediate steps of algorithm 2. Now, two sets and are maintained during the enumeration process. At any -th iteration of the while loop at Line 5, maintains the current set of cliques which are yet to be processed for vertex addition and stores the new cliques formed in that -th iteration. At the beginning, all the initial cliques from are copied into . A clique is taken out from which is duration wise maximal and the IS_MAX flag is set to true for indicating the current clique as maximal -clique. For vertex addition, it is trivial to convince that only for the neighboring vertices of , there is a possibility of to be a -clique. If the new vertex set is found in with one of its value as , the IS_MAX flag is set to false, signifying that the processing clique is not maximal. Otherwise, if is not present in , all the possible time intervals in which can form a -clique are computed from Line 16 to 37. This process is iterated for all the neighboring vertices of (Line to ). Now, we describe the statements from Line 17 to 36 in detail. As mentioned earlier, to form a -clique with the new vertex set all the possible combinations from of size , (represented as C), has to be a -clique. Now, for all C), if is present in , it signifies the possibility of forming a new clique with the vertex set (Line 17). Now, all the entries of these combinations are taken into a temporary data structure from . For the clarity of presentation, we describe the operations from Line 19 to 35 for one vertex addition, i.e., with the help of an example shown in Figure 3. Now, let the entries of are , i.e., all C and the length corresponding entries in are respectively. So, one sample from is taken as in Line 19 of Algorithm 2. One possible value of is . For this value, the resultant interval is computed as . If the difference between and is more than or equal to , then the newly formed -clique, , is added in and . Also, if matches with the current interval of , then the flag is set to False, i.e., is not maximal. Now, this step is repeated for all the samples from from Line 19 to 35. This ensures that all the intervals in which forms -clique are added in . Now, if none of the vertices from is possible to add in , becomes maximal -clique and added into final maximal clique set at Line 40. Vertex addition checking is performed for all the cliques of in the while loop from Line 7 to 42. When is exhausted and is not empty, the contents of are copied back into for further processing, signifying that all the maximal cliques have not been found yet. This is controlled using the flag in the While loop at Line 5. If no clique is added into , the flag is set to true so that in the next iteration the condition of the While loop at Line 5 will be false and finally Algorithm 2 terminates. At the end, for the temporal network , contains all the maximal -cliques of it. One illustrative example of the enumeration Algorithm is given in Figure 4.

Data: A Temporal Network , Initial Clique Set .
Result: Maximal Clique Set of .
1 ;
;
  // with the index as vertex set and time intervals as entries
2 ;
3 ;
4 while  ALL_MAXIMAL do
5       ;
6       while  do
7             Take and remove a clique ;
8             ;
9             for Every  do
10                   ;
11                   if  then
12                         if  then
13                               ;
14                         end if
15                        
16                  else
17                         if  then
18                               ;
19                               foreach  permutation of entries as  do
20                                     ;
21                                     ;
22                                     for  do
23                                           ;
24                                           ;
25                                          
26                                     end for
27                                    ;
28                                     ;
29                                     if  then
30