Enumerating maximal cliques in link streams with durations

12/19/2017 ∙ by Tiphaine Viard, et al. ∙ 0

Link streams model interactions over time, and a clique in a link stream is defined as a set of nodes and a time interval such that all pairs of nodes in this set interact permanently during this time interval. This notion was introduced recently in the case where interactions are instantaneous. We generalize it to the case of interactions with durations and show that the instantaneous case actually is a particular case of the case with durations. We propose an algorithm to detect maximal cliques that improves our previous one for instantaneous link streams, and performs better than the state of the art algorithms in several cases of interest.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A graph is a pair of sets where is a set of nodes and a set of links. If there is no distinction between and for and in , then is undirected. If for all in , then is simple. One generally considers undirected simple graphs. A clique of is a set of nodes such that all nodes of are linked to each other, i.e. for all , . A clique is maximal if there is no other clique such that . Enumerating maximal cliques of a graph is one of the most fundamental problems in computer science [1, 2, 10, 4].

An instantaneous link stream is a triplet where is a time interval, a set of nodes and a set of instantaneous links. If there is no distinction between and , then is undirected. If for all in , then is simple. For any given duration , we introduced in [12] the notion of -clique in a simple undirected link stream : it is a pair with and such that for all and with there is a in such that . In other words, there is an interaction between all pairs of nodes in at least once every within . A clique is maximal if there is no other clique such that , and . We proposed a first algorithm to enumerate all maximal -cliques of an instantaneous link stream [12], recently improved by adapting the Bron-Kerbosch algorithm [6, 7].

A link stream with durations, or simply link stream, is a triplet where is a time interval, a set of nodes and a set of links such that for all links in we have 111Though in theory we can consider that time is either discrete or continuous, in all practical cases the datasets have an intrinsic time resolution and time is therefore discrete in practice.. We call the duration of the link. If there is no distinction between and , then is undirected. If for all in , and for all and in , then is simple. In the remainder of this paper, unless explicitly specified, we will consider simple undirected link streams only.

A clique in a link stream with durations is a pair with and such that for all there is a link in such that . In other words, all pairs of nodes in are continuously linked together from to . A clique is maximal if there is no other clique such that , and . See Figure 1 for an illustration.

Figure 1: Maximal cliques in the link stream , with , and . Two maximal cliques of are highlighted: in red, is a maximal clique since is the largest interval over which nodes , and all interact together and they don’t interact with over this time interval. Similarly, is a maximal clique, since does not interact with over , and there is no larger interval such that all interact together. There are no other maximal cliques in involving nodes, but there are many other maximal cliques in , such as or for instance.

Cliques in link streams with durations (and -cliques in instantaneous link streams) bring valuable information in the study of different kinds of datasets; for instance they indicate malicious computers coordinating their actions [11]. Likewise, co-presence relations between animals is a key source of insight in ethology [3], and cliques in the link streams with duration modeling such data may indicate significant meetings. Many other fields may benefit from clique computations in link streams with durations in a similar way.

In this paper, we first extend our algorithm for maximal -cliques in instantaneous link streams [12] to enumerate all maximal cliques in link streams with durations. The obtained algorithm is significantly simpler than the previous version, and has a slightly lower complexity; we show that it is possible to use it to enumerate maximal cliques in instantaneous link streams too, making it both more general and more efficient than our previous algorithm. Experiments show that its running time is better than our previous algorithm, as expected, but also that it outperforms the more recent algorithm of Himmel et al. [6, 7] in several cases of practical interest.

2 Algorithm

Like in [12] our algorithm (Algorithm 1) relies on a set of previously computed cliques that we call candidates, and a set of already seen cliques. We initially populate both sets with the trivial clique , for all links (Line 2) (finding cliques involving only one node is trivial and makes little sense, so we ignore them). Then, our algorithm iteratively picks and processes an element from (Line 4), until is empty (while loop from Line 3 to Line 17). Processing consists in searching for nodes such that is a clique (Lines 6 to 10), and for times such that is a clique (Lines 11 to 15).

For each node not in , Line 7 checks that for all in , there exists a link in the stream, with . If satisfies this property, then is not maximal (Line 8) and if has not already been seen (Line 9) then we add to both and (Line 10).

The value of computed at Line 11 is the largest time such that is a clique. Line 12 checks that this clique is different from , i.e. . In this case, is not maximal (Line 13), and if the new clique is new (Line 14) we add it to and (Line 15).

If no node or time satisfies the conditions above, then the clique is maximal and isMax is true when we reach Line 16; we add the maximal clique to the output (Line 17).

input: a simple undirected link stream with durations
output: the set of all maximal cliques in involving at least two nodes

1:, ,
2:for : add to and to
3:while  do
4:     take and remove from
5:     set isMax to True
6:     for  in  do
7:         if  is a clique then
8:              set isMax to False
9:              if  not in  then
10:                  add to and                             
11:     
12:     if  then
13:         set isMax to False
14:         if  not in  then
15:              add to and               
16:     if isMax then
17:         add to      
18:return
Algorithm 1 Maximal cliques of a simple undirected link stream with durations
Theorem 1 (Correctness).

Given a simple undirected link stream with durations, Algorithm 1 computes the set of all its maximal cliques involving at least two nodes.

We first show that all the elements in the output of Algorithm 1 are cliques, then that they are maximal, and finally that all maximal cliques are in this output.

Lemma 1.

In Algorithm 1, all elements of are cliques of .

Proof.

initially contains cliques (Line 2) and Line 10 clearly preserves this property. The value computed at Line 11 is the smallest value such that there exists a link of the form for any two nodes . Since this means that is a clique, and so the elements added at Line 15 are also cliques. ∎

Lemma 2.

All the elements of the set returned by Algorithm 1 are maximal cliques of .

Proof.

A clique from is added to only if isMax is True (Lines 16 and 17). If is not maximal, then at least one of three conditions is true: (1) there exists a node such that is a clique, and then isMax is set to False at Lines 7 and 8, or (2) there exists a time such that is a clique, or (3) there exists a time such that is a clique. The second case cannot occur as all elements of all involve (from its initialization and by recurrence) a link starting at . If we are in the third case, then Line 11 computes a value of satisfying , which implies that the condition of Line 12 is satisfied, and so Line 13 sets isMax to False. Finally, if a clique of is not maximal it cannot be added to . ∎

In order to prove the final result, we need the following lemma.

Lemma 3.

Let be a simple link stream, and let be a maximal clique of . Then there exists a link in such that and are in and .

Proof.

Assume this is false. Then, for all , there is a link such that and . Then clearly is not maximal. ∎

Lemma 4.

The set returned by Algorithm 1 contains all maximal cliques of .

Proof.

Let us consider a maximal clique of . If is in at some stage then it is easy to check that the algorithm adds it to . We therefore show that is in at some stage.

Let us denote by the size of , i.e. . Let such that there is a link in ; such nodes exist according to Lemma 3. Let , , and let be an arbitrary ordering of all nodes in . For all , , we consider the clique .

We will show that, if is in at some stage, then the algorithm adds to when is the clique picked in at Line 4. Indeed, when this happens, Lines 6 to 10 build from it and thus add to and at Line 10.

is added to by Line 2, and if is in at some stage, then the algorithm adds to . Therefore, is added to at some stage.

Consider now the iteration at which is picked from at Line 4. Then, the value computed at Line 11 is equal to . Otherwise, either and then would not maximal, or and then would not be a clique. Therefore, the clique added to at Line 15 is . ∎

Theorem 2 (Complexity).

Given a link stream with and , Algorithm 1 enumerates all maximal cliques of in time.

Proof.

The complexity of Algorithm 1 is dominated by the complexity of its main loop. The number of iterations of this loop is bounded by the number of elements added to , which is bounded by the number of subsets of times the number of sub-intervals of of the form , where is the beginning of a link in and is the end of a (different) link in . Therefore, the number of iterations of the loop is in .

In the following, we assume that sets are stored as binary search trees, so that it is possible to search and add for an element in a set in . We also assume that all links are stored in a list, or binary tree, and are sorted by node pair then time, so that it is possible to search for a link in .

Now, let us consider a clique picked from at Line 4. Lines 6 to 10 search for cliques of the form , , and Lines 11 to 15 search for a clique , . We analyze the two blocks separately.

For any , Line 7 checks if for all there is a link such that . This requires at most steps, and so it is in . Line 9 searches for the newly found clique in , which contains elements. Since two cliques can be compared in , this search is in . The algorithm repeats these operations for all , so less than times. Finally, Lines 6 to 10 have a cost in .

Computing at Line 11 can be done in at most operations, and therefore is in . Lines 14 and 15 can be done in , as above. The complexity of Lines 11 to 15 is then .

We conclude that each iteration of the main loop costs no more than . We bound the overall complexity by multiplying this cost by the bound for the number of iterations of the loop. This leads to the complexity. ∎

3 Equivalence with -cliques

Let us consider an instantaneous link stream with and a value . We first define as , and, for all and in , the set . We then define as the set of all tuples such that is a maximal nonempty interval included in , for any and in . We finally obtain the simple link stream with durations .

Theorem 3.

is a maximal -clique of if and only if is a maximal clique of .

Proof.

Assume is a -clique of and is not a clique of . It means that there is a in and and in such that and are not linked at time in . This means that there is no link in with , which contradicts the assumption that is a -clique of .

Conversely, assume is a clique of and is not a -clique of . It means that there is an interval of duration such that and for all in , . If we denote by the largest element of , it is at least equal to and at most equal to , and so there is no link containing in , which contradicts the assumption that is a clique of .

Therefore, is a -clique of if and only if is a clique of . This is a bijection between the -cliques of and the cliques of that preserves maximality. ∎

As a consequence, Algorithm 1 may be used to compute the maximal -cliques in instantaneous link streams (with complexity lower than the initial algorithm published in [12] by an improvement factor of ).

4 Experiments

We implemented Algorithm 1 and compared its running time with those of our previous implementation for computing -cliques [12], as well as the implementation provided by Himmel et al. for their own algorithm [6, 7]. All implementations are in Python[13].

We used three datasets of different sizes coming from different contexts:

  • Thiers-Highschool [5] is a trace of physical proximity between individuals, captured by sensors. It was collected at a French high school in 2012 over a period of approximately 8 days. It contains 180 nodes and 45,047 links.

  • DNC-email is the 2016 Democratic National Committee email leak [9], representing emails sent over a period of approximately one year and four months 222This duration covers the majority of emails; notice that a single email was sent approximately one year and a half before any other.. It contains 1,866 nodes and 37,421 links.

  • Infectious is a trace of physical proximity between visitors of an exhibition [8], collected over a period of approximately 80 days. It contains 10,972 nodes and 415,912 links.

Figure 2: Running times (in seconds) for computing -cliques, for different values of (in seconds). Left: Thiers-Highschool; middle: DNC-email; right: Infectious.

Results are presented in Figure 2.

We clearly observe two things. First, our implementation significantly outperforms the code for -cliques for all values of . Second, our implementation is the fastest for many relevant values of : in the cases of physical proximity our implementation is the fastest for all values of lower than 3 hours, and in the case of emails, it is the fastest for all values of lower than 11 hours. Notice that these values are of practical interest: exchanging emails at least once every hour within a group of people for a given time period, for instance, is significant.

In conclusion, although practical evaluation of algorithms is difficult, these experiments show that the most efficient solution depends on the target application, and that our algorithm is appealing for small values of at least in some cases of practical interest.


Acknowledgments. This work is funded in part by the European Commission H2020 FETPROACT 2016-2017 program under grant 732942 (ODYCCEUS), by the ANR (French National Agency of Research) under grants ANR-15-CE38-0001 (AlgoDiv) and ANR-13-CORD-0017-01 (CODDDE), by the French program ”PIA - Usages, services et contenus innovants” under grant O18062-44430 (REQUEST), and by the Ile-de-France program FUI21 under grant 16010629 (iTRAC).

References

  • [1] J. Gary Augustson and Jack Minker. An analysis of some graph theoretical cluster techniques. Journal of the ACM (JACM), 17(4):571–588, 1970.
  • [2] Coen Bron and Joep Kerbosch. Algorithm 457: finding all cliques of an undirected graph. Communications of the ACM, 16(9), 1973.
  • [3] Pádraig Mac Carron and R.I.M. Dunbar. Identifying natural grouping structure in gelada baboons: a network approach. Animal Behaviour, 114(Supplement C):119 – 128, 2016.
  • [4] David Eppstein and Darren Strash. Listing all maximal cliques in large sparse real-world graphs. Experimental Algorithms, pages 364–375, 2011.
  • [5] Julie Fournet and Alain Barrat. Contact patterns among high school students. PLoS ONE, 9:e107878, 2014.
  • [6] Anne-Sophie Himmel, Hendrik Molter, Rolf Niedermeier, and Manuel Sorge. Enumerating maximal cliques in temporal graphs. In Advances in Social Networks Analysis and Mining (ASONAM), 2016 IEEE/ACM International Conference on, pages 337–344. IEEE, 2016.
  • [7] Anne-Sophie Himmel, Hendrik Molter, Rolf Niedermeier, and Manuel Sorge. Adapting the bron-kerbosch algorithm for enumerating maximal cliques in temporal graphs. Social Network Analysis and Mining, 7:35:1–35:16, 2017.
  • [8] L. Isella, J. Stehlé, A. Barrat, C. Cattuto, J.-F. Pinton, , and W. Van den Broeck. What’s in a crowd? analysis of face-to-face behavioral networks. Journal of Theoretical Biology, 271(1):166–180, 2011.
  • [9] KONECT. DNC emails network dataset, 2017. http://konect.uni-koblenz.de/networks/dnc-temporalGraph.
  • [10] Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical Computer Science, 363:28–42, 2006.
  • [11] Tiphaine Viard, Raphaël Fournier-S’niehotta, Clémence Magnien, and Matthieu Latapy. Discovering patterns of interest in IP traffic using cliques in bipartite link streams. In Proceedings of the International Conference on Complex Networks(CompleNet), 2018. To appear.
  • [12] Tiphaine Viard, Matthieu Latapy, and Clémence Magnien. Computing maximal cliques in link streams. Theor. Comput. Sci., 609:245–252, 2016.
  • [13] Tiphaine Viard and Clémence Magnien. Source code in python for our algorithm, 2017. https://bitbucket.org/tiph_viard/cliques.