A graph is a pair of sets where is a set of nodes and a set of links. If there is no distinction between and for and in , then is undirected. If for all in , then is simple. One generally considers undirected simple graphs. A clique of is a set of nodes such that all nodes of are linked to each other, i.e. for all , . A clique is maximal if there is no other clique such that . Enumerating maximal cliques of a graph is one of the most fundamental problems in computer science [1, 2, 10, 4].
An instantaneous link stream is a triplet where is a time interval, a set of nodes and a set of instantaneous links. If there is no distinction between and , then is undirected. If for all in , then is simple. For any given duration , we introduced in  the notion of -clique in a simple undirected link stream : it is a pair with and such that for all and with there is a in such that . In other words, there is an interaction between all pairs of nodes in at least once every within . A clique is maximal if there is no other clique such that , and . We proposed a first algorithm to enumerate all maximal -cliques of an instantaneous link stream , recently improved by adapting the Bron-Kerbosch algorithm [6, 7].
A link stream with durations, or simply link stream, is a triplet where is a time interval, a set of nodes and a set of links such that for all links in we have 111Though in theory we can consider that time is either discrete or continuous, in all practical cases the datasets have an intrinsic time resolution and time is therefore discrete in practice.. We call the duration of the link. If there is no distinction between and , then is undirected. If for all in , and for all and in , then is simple. In the remainder of this paper, unless explicitly specified, we will consider simple undirected link streams only.
A clique in a link stream with durations is a pair with and such that for all there is a link in such that . In other words, all pairs of nodes in are continuously linked together from to . A clique is maximal if there is no other clique such that , and . See Figure 1 for an illustration.
Cliques in link streams with durations (and -cliques in instantaneous link streams) bring valuable information in the study of different kinds of datasets; for instance they indicate malicious computers coordinating their actions . Likewise, co-presence relations between animals is a key source of insight in ethology , and cliques in the link streams with duration modeling such data may indicate significant meetings. Many other fields may benefit from clique computations in link streams with durations in a similar way.
In this paper, we first extend our algorithm for maximal -cliques in instantaneous link streams  to enumerate all maximal cliques in link streams with durations. The obtained algorithm is significantly simpler than the previous version, and has a slightly lower complexity; we show that it is possible to use it to enumerate maximal cliques in instantaneous link streams too, making it both more general and more efficient than our previous algorithm. Experiments show that its running time is better than our previous algorithm, as expected, but also that it outperforms the more recent algorithm of Himmel et al. [6, 7] in several cases of practical interest.
Like in  our algorithm (Algorithm 1) relies on a set of previously computed cliques that we call candidates, and a set of already seen cliques. We initially populate both sets with the trivial clique , for all links (Line 2) (finding cliques involving only one node is trivial and makes little sense, so we ignore them). Then, our algorithm iteratively picks and processes an element from (Line 4), until is empty (while loop from Line 3 to Line 17). Processing consists in searching for nodes such that is a clique (Lines 6 to 10), and for times such that is a clique (Lines 11 to 15).
For each node not in , Line 7 checks that for all in , there exists a link in the stream, with . If satisfies this property, then is not maximal (Line 8) and if has not already been seen (Line 9) then we add to both and (Line 10).
The value of computed at Line 11 is the largest time such that is a clique. Line 12 checks that this clique is different from , i.e. . In this case, is not maximal (Line 13), and if the new clique is new (Line 14) we add it to and (Line 15).
Theorem 1 (Correctness).
Given a simple undirected link stream with durations, Algorithm 1 computes the set of all its maximal cliques involving at least two nodes.
We first show that all the elements in the output of Algorithm 1 are cliques, then that they are maximal, and finally that all maximal cliques are in this output.
In Algorithm 1, all elements of are cliques of .
All the elements of the set returned by Algorithm 1 are maximal cliques of .
A clique from is added to only if isMax is True (Lines 16 and 17). If is not maximal, then at least one of three conditions is true: (1) there exists a node such that is a clique, and then isMax is set to False at Lines 7 and 8, or (2) there exists a time such that is a clique, or (3) there exists a time such that is a clique. The second case cannot occur as all elements of all involve (from its initialization and by recurrence) a link starting at . If we are in the third case, then Line 11 computes a value of satisfying , which implies that the condition of Line 12 is satisfied, and so Line 13 sets isMax to False. Finally, if a clique of is not maximal it cannot be added to . ∎
In order to prove the final result, we need the following lemma.
Let be a simple link stream, and let be a maximal clique of . Then there exists a link in such that and are in and .
Assume this is false. Then, for all , there is a link such that and . Then clearly is not maximal. ∎
The set returned by Algorithm 1 contains all maximal cliques of .
Let us consider a maximal clique of . If is in at some stage then it is easy to check that the algorithm adds it to . We therefore show that is in at some stage.
Let us denote by the size of , i.e. . Let such that there is a link in ; such nodes exist according to Lemma 3. Let , , and let be an arbitrary ordering of all nodes in . For all , , we consider the clique .
is added to by Line 2, and if is in at some stage, then the algorithm adds to . Therefore, is added to at some stage.
Theorem 2 (Complexity).
Given a link stream with and , Algorithm 1 enumerates all maximal cliques of in time.
The complexity of Algorithm 1 is dominated by the complexity of its main loop. The number of iterations of this loop is bounded by the number of elements added to , which is bounded by the number of subsets of times the number of sub-intervals of of the form , where is the beginning of a link in and is the end of a (different) link in . Therefore, the number of iterations of the loop is in .
In the following, we assume that sets are stored as binary search trees, so that it is possible to search and add for an element in a set in . We also assume that all links are stored in a list, or binary tree, and are sorted by node pair then time, so that it is possible to search for a link in .
For any , Line 7 checks if for all there is a link such that . This requires at most steps, and so it is in . Line 9 searches for the newly found clique in , which contains elements. Since two cliques can be compared in , this search is in . The algorithm repeats these operations for all , so less than times. Finally, Lines 6 to 10 have a cost in .
We conclude that each iteration of the main loop costs no more than . We bound the overall complexity by multiplying this cost by the bound for the number of iterations of the loop. This leads to the complexity. ∎
3 Equivalence with -cliques
Let us consider an instantaneous link stream with and a value . We first define as , and, for all and in , the set . We then define as the set of all tuples such that is a maximal nonempty interval included in , for any and in . We finally obtain the simple link stream with durations .
is a maximal -clique of if and only if is a maximal clique of .
Assume is a -clique of and is not a clique of . It means that there is a in and and in such that and are not linked at time in . This means that there is no link in with , which contradicts the assumption that is a -clique of .
Conversely, assume is a clique of and is not a -clique of . It means that there is an interval of duration such that and for all in , . If we denote by the largest element of , it is at least equal to and at most equal to , and so there is no link containing in , which contradicts the assumption that is a clique of .
Therefore, is a -clique of if and only if is a clique of . This is a bijection between the -cliques of and the cliques of that preserves maximality. ∎
We implemented Algorithm 1 and compared its running time with those of our previous implementation for computing -cliques , as well as the implementation provided by Himmel et al. for their own algorithm [6, 7]. All implementations are in Python.
We used three datasets of different sizes coming from different contexts:
Thiers-Highschool  is a trace of physical proximity between individuals, captured by sensors. It was collected at a French high school in 2012 over a period of approximately 8 days. It contains 180 nodes and 45,047 links.
DNC-email is the 2016 Democratic National Committee email leak , representing emails sent over a period of approximately one year and four months 222This duration covers the majority of emails; notice that a single email was sent approximately one year and a half before any other.. It contains 1,866 nodes and 37,421 links.
Infectious is a trace of physical proximity between visitors of an exhibition , collected over a period of approximately 80 days. It contains 10,972 nodes and 415,912 links.
Results are presented in Figure 2.
We clearly observe two things. First, our implementation significantly outperforms the code for -cliques for all values of . Second, our implementation is the fastest for many relevant values of : in the cases of physical proximity our implementation is the fastest for all values of lower than 3 hours, and in the case of emails, it is the fastest for all values of lower than 11 hours. Notice that these values are of practical interest: exchanging emails at least once every hour within a group of people for a given time period, for instance, is significant.
In conclusion, although practical evaluation of algorithms is difficult, these experiments show that the most efficient solution depends on the target application, and that our algorithm is appealing for small values of at least in some cases of practical interest.
Acknowledgments. This work is funded in part by the European Commission H2020 FETPROACT 2016-2017 program under grant 732942 (ODYCCEUS), by the ANR (French National Agency of Research) under grants ANR-15-CE38-0001 (AlgoDiv) and ANR-13-CORD-0017-01 (CODDDE), by the French program ”PIA - Usages, services et contenus innovants” under grant O18062-44430 (REQUEST), and by the Ile-de-France program FUI21 under grant 16010629 (iTRAC).
-  J. Gary Augustson and Jack Minker. An analysis of some graph theoretical cluster techniques. Journal of the ACM (JACM), 17(4):571–588, 1970.
-  Coen Bron and Joep Kerbosch. Algorithm 457: finding all cliques of an undirected graph. Communications of the ACM, 16(9), 1973.
-  Pádraig Mac Carron and R.I.M. Dunbar. Identifying natural grouping structure in gelada baboons: a network approach. Animal Behaviour, 114(Supplement C):119 – 128, 2016.
-  David Eppstein and Darren Strash. Listing all maximal cliques in large sparse real-world graphs. Experimental Algorithms, pages 364–375, 2011.
-  Julie Fournet and Alain Barrat. Contact patterns among high school students. PLoS ONE, 9:e107878, 2014.
-  Anne-Sophie Himmel, Hendrik Molter, Rolf Niedermeier, and Manuel Sorge. Enumerating maximal cliques in temporal graphs. In Advances in Social Networks Analysis and Mining (ASONAM), 2016 IEEE/ACM International Conference on, pages 337–344. IEEE, 2016.
-  Anne-Sophie Himmel, Hendrik Molter, Rolf Niedermeier, and Manuel Sorge. Adapting the bron-kerbosch algorithm for enumerating maximal cliques in temporal graphs. Social Network Analysis and Mining, 7:35:1–35:16, 2017.
-  L. Isella, J. Stehlé, A. Barrat, C. Cattuto, J.-F. Pinton, , and W. Van den Broeck. What’s in a crowd? analysis of face-to-face behavioral networks. Journal of Theoretical Biology, 271(1):166–180, 2011.
-  KONECT. DNC emails network dataset, 2017. http://konect.uni-koblenz.de/networks/dnc-temporalGraph.
-  Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical Computer Science, 363:28–42, 2006.
-  Tiphaine Viard, Raphaël Fournier-S’niehotta, Clémence Magnien, and Matthieu Latapy. Discovering patterns of interest in IP traffic using cliques in bipartite link streams. In Proceedings of the International Conference on Complex Networks(CompleNet), 2018. To appear.
-  Tiphaine Viard, Matthieu Latapy, and Clémence Magnien. Computing maximal cliques in link streams. Theor. Comput. Sci., 609:245–252, 2016.
-  Tiphaine Viard and Clémence Magnien. Source code in python for our algorithm, 2017. https://bitbucket.org/tiph_viard/cliques.