I Introduction
Networks have been widely used to represent entities, relationships, and behaviors in many realworld domains including power grids [4], social networks [12], microbial interaction networks [29], corporate networks [30], the food web [10], and modeling adversarial activities [5]. These complex systems do not show a temporal or structural continuum, but rather show a characteristic nonlinear dynamic behavior [31]. Many salient properties of these systems can be described by different network metrics, measured on a global scale. Countbased metrics such as the number of entities, the number of interactions, and the average connectivity of the entities in the network are important measures that represents the population and the interaction density of the entities involved in the network. However, these measures are limited in their ability to describe nonlinear, localized, and dynamic properties of the systems. In order to uncover the structural, temporal, and functional insight of complex systems, network motifs have been used extensively in recent years. Network motifs are patterns of interactions occurring in the complex system at a rate higher than those in a randomized network [21].
The increasing volume, velocity, and variety of temporal networks generated by Big Data applications pose a scalability challenge to the problem of temporal network analysis. Many of the domains generate a constant stream of heterogeneous network channels, and it is not possible to measure and update global network properties in real time. Temporal motifs provide a tractable approximation of the networks that can be measured and updated in Big Data applications within given memory and compute constraints.
Extensive research has been done on the appropriate definition of network motifs and their application to various network analytical tasks. Milo et al. [21]
detect network motifs by finding all possible nnode subgraphs and retaining only those with a higher probability of appearing in a realworld graph with frequency
f than in a randomized network. Such frequencybased approaches reveal interesting properties about the complex network. The motifs shared by ecological food webs are found to be distinct from the motifs shared by genetic networks. Similar motifs are found in networks that perform information processing even though the entities involved are different in those networks [21]. Patterns that are functionally important but not statistically significant are missed by this approach. Vazquez et al. [32] show that the largescale topological organization of a network and its local subgraph structure mutually define and predict each other. The correlation can be used to understand the properties and evolution of a network.Cao et al. [3] use network motifs to define the network backbone, which is a collection of relevant nodes and edges in the largescale network. They define a motifbased extraction method to extract the functional backbone of the complex network. The functional backbone is indicative of certain functional properties of the network that cannot be explained by centralitybased backbones. Similarly, Shen et al. [29] use a weighted motif to cluster microbial interaction networks. Network motifs can also be used to identify the exchange of emotions in online communication networks, such as Twitter [15], using emotionexchange motifs. The emotionexchange motifs containing reciprocal edges manifest anger or fear, either in isolation or in any combination with other emotions. Conversely, positive emotions are characteristic of oneway motifs.
A temporal network is a generalization of a static network that changes with time. Many system modeling approaches model time as an attribute of the entity or the interaction, which makes temporal graphs a special case of attributed graphs. We interchangeably use network and graph in this paper. Incorporating time into static graphs has given rise to a new set of important and challenging problems that cannot be modeled as a static graph problem [20]. A majority of the prior research does not account for the temporal evolution of the motif. Recent work [24] defines temporal motif as an elementary unit of the temporal network and provides a general methodology for counting such motifs. It computes the frequency of overlapping temporal motifs, where one interaction can be part of more than one temporal motif. In a temporal motif, all the edges in a given motif have to occur inside the time period of time units. Aparício et al. [1] use orbit transitions to compare a set of temporal networks. Sarkar et al. [27] use the temporal motif to understand information flow in social networks.
We propose the Independent Temporal Motif (ITeM) as the elementary building block of temporal networks. ITeMs are edgedisjoint temporal motifs that provide insight about the temporal evolution of a graph, such as its rate of growth, neighborhood, and the change in the role of a vertex over time. Independence of the temporal motif leads to mutually exclusive motif instances by restricting each edge to participate in only one temporal motif instance. We use an ensemble of the temporal motifs that are simple to compute but at the same time representative of temporal, structural, and functional properties of the network. We also define properties to measure the temporal evolution of the motifs, which informs the rate at which motifs are formed in the network. In contrast to previous work, no limit is put on the time window of the motif, but it can be restricted optionally. We provide algorithms to compute the independent temporal motif distribution of a given graph. We also provide a new distributed implementation using the Apache Spark graph analytic framework.
The rest of the paper is organized as follows. Section II lays out various definitions and section III presents our core approach. Section IV shows our experimentation with synthetic and realworld temporal networks to summarize the temporal networks and measure their similarity. Section V presents conclusions and future work.
Symbol  Description 

Temporal graph  
window  
Total number of windows  
Total number of motifs  
Atomic motif  
Temporal motif of atomic Motif  
Set of timesteps associated with motif edges  
Motif instance  
ITeM instance  
Number of vertices in motif  
Number of unique vertices in ITeM instances of motif  
Set of Importance values for each window  
Importance of window  
Temporal motif distribution for a given temporal graph  
Order of a motif  
Orbit of a motif 
Ii Definitions
We present the ITeMbased approach to characterize a temporal network. In the following sections, we present definitions and algorithms used by ITeM to model a temporal network. We also review the Maximum Independent Set (MIS) problem, which is a subproblem of the proposed algorithm. MIS has been proved to be an NPcomplete problem, and we present a heuristicbased approach to finding the lower bound on the ITeM frequency
[18]. We also outline a sampling method to estimate the true frequency of a temporal motif in the network. The sampling approach is based on the importance of the sampled network [17].A temporal graph is a specialization of a static graph, where each edge of the static graph appears at a time unit such as second, day, year, etc. Various representations of temporal graphs that are useful in different scenarios are proposed [19]. We use a windowbased representation, where each window corresponds to a temporal subgraph between two timestamps.
Definition 1.
Temporal Graph: A temporal graph T is an ordered sequence of graphs , indexed by a window id . We define , where and denote the vertex and edge sets, respectively, in the window , arriving since the window . We say the temporal graph is on vertex set and edge set .
This definition allows for the representation of a large graph with a single window. Analyzing a single temporal graph is useful for datasets that are small in size and cover a small period of time.
Iia Atomic Motif
Atomic motifs are small subgraphs that serve as interesting indicators for complex networks. They can reveal patterns of association among entities in the network. Figure 1 shows a library of atomic motifs used in the current work. Lowerorder motifs such as isolated vertex (order d=1), selfloop (d=1), and isolated edge (d=2) are examples of fringe motifs as they have less (sometimes zero) connectivity to the rest of the network. Whereas, higherorder motifs such as wedge (d=3), triangle (d=3), and square (d=4) are an example of core motifs, which have been found to constitute a major fraction of realworld graphs. Our experimentation shows that the relative frequencies of fringe and core motifs in a temporal network can be used to compute graph similarity.
We can define atomic motifs of any number of vertices and edges, but the larger motifs are more difficult to search for in a network and at the same time, do not substantially increase the quality of actionable information about the network. The search for large atomic motifs suffers from the intractability of the subgraph isomorphism problem and leads to an exponential increase in the runtime. Conversely, smaller atomic motifs are easier to find and yield better dividends in terms of modeling temporal and structural characteristics of the graph.
We limit our motif library to 4order motifs. The selection of dorder motifs to include in the search library has been influenced by previous research in this area, functional interpretation of the motifs in realworld domains, and computational pragmatism. In addition to the higherorder motifs (d 2), we also make use of a few fringe motifs that provide insight about a complex network that is not captured by such higherorder motifs. and correspond to isolated vertices and isolated edges in the network that are not part of any higherorder motif. An abundance of such motifs is a clear indicator of a sparse, disconnected state of the network and is important to model some domains, such as powergrids [6]. Similarly, and correspond to selfloop and multiedges between the same set of entities. Frequencies of such motifs show important functional properties of the network and can be used to convert it into a smaller weighted network, where the selfloops and the multiedges are converted into vertex and edge weights, respectively. At the same time, motifs such as and also contribute to the combinatorial explosion of the higherorder motifs.
Atomic motifs are small subgraphs that serve as interesting indicators for complex networks. They can reveal patterns of association among entities in the network. Figure 1 shows a library of atomic motifs used in the current work. Lowerorder motifs such as isolated vertex (order d=1), selfloop (d=1), and isolated edge (d=2) are examples of fringe motifs as they have less (sometimes zero) connectivity to the rest of the network. Whereas, higherorder motifs such as wedge (d=3), triangle (d=3), and square (d=4) are an example of core motifs, which have been found to constitute a major fraction of realworld graphs. Our experimentation shows that the relative frequencies of fringe and core motifs in a temporal network can be used to compute graph similarity.
We can define atomic motifs of any number of vertices and edges, but the larger motifs are more difficult to search for in a network and at the same time, do not substantially increase the quality of actionable information about the network. The search for large atomic motifs suffers from the intractability of the subgraph isomorphism problem and leads to an exponential increase in the runtime. Conversely, smaller atomic motifs are easier to find and yield better dividends in terms of modeling temporal and structural characteristics of the graph.
We limit our motif library to 4order motifs. The selection of dorder motifs to include in the search library has been influenced by previous research in this area, functional interpretation of the motifs in realworld domains, and computational pragmatism. In addition to the higherorder motifs (d 2), we also make use of a few fringe motifs that provide insight about a complex network that is not captured by such higherorder motifs. and correspond to isolated vertices and isolated edges in the network that are not part of any higherorder motif. An abundance of such motifs is a clear indicator of a sparse, disconnected state of the network and is important to model some domains, such as powergrids [6]. Similarly, and correspond to selfloop and multiedges between the same set of entities. Frequencies of such motifs show important functional properties of the network and can be used to convert it into a smaller weighted network, where the selfloops and the multiedges are converted into vertex and edge weights, respectively. At the same time, motifs such as and also contribute to the combinatorial explosion of the higherorder motifs.
IiB Temporal Motif
Definition 2.
Temporal Motif: A Temporal Motif is a graph where:

is a set of vertices of the motif.

is a set of edges e E, e: where is a set of time steps associated with motif edges.

Edges have a temporal ordering such that for an edge : and : if then arrives before .
A Temporal Motif is a specialization of the atomic motif, where every interaction between two vertices occurs at a specific timestep. The timestep of an edge defines a temporal ordering of the edge within the temporal motif . However, it does not correspond to the actual time of the interaction in the temporal graph. Using this definition, we extend the atomic motif to model its temporal evolution in terms of size and structure. Characterization of the temporal network using a set of static motifs can be misleading and inaccurate because the static motifs fail to capture the temporal properties of the network, such as the scale at which transactions occur [2], burstiness of the transactions, and temporal dependency among the set of transactions. Additionally, many temporal systems are characterized as a dense multigraph, where a pair of entities share many temporal transactions as the network evolves. This poses additional combinatorial complexity challenges beyond discovering structural motifs in the network. Figure 3 shows a set of temporal motifs used in this work.
IiC Independent Temporal Motif (ITeM)
Schreiber and Schwobbermeyer [28] describe three different ways to measure the frequency of any pattern in a graph. They categorize them as , , and concepts. In the context of motif computation, includes every occurrence of a motif instance without any restriction, such as reusing a vertex or an edge while computing the frequency of motif instances. Paranjape et al. [24] use this definition to compute overlapping motif frequencies. and concepts put restrictions on the reuse of a vertex or edge. is an edgedisjoint concept and does not allow the reuse of an edge in more than one instance of the motif. Similarly, is more restrictive as it is a vertex and edgedisjoint concept and does not allow reuse of any vertex and edge in more than one instance of the motif.
A major contribution of our work is the ITeM, which is an edgedisjoint temporal motif such that no two motif instances share any edge between them. It is different than the temporal network modeling approaches mentioned in the related work, which use overlapping motif instances where some instances of a motif can share any number of edges. This restriction poses a greater complexity issue as finding temporal motifs is proved to be an NPComplete problem [17]. In the following subsections, we define some key concepts used by ITeM to model a temporal network.
IiC1 Vertex BirthTime
We define the birthtime of a vertex in the temporal network as the time of the first transaction involving the vertex. The birth of a vertex increases the network size by one vertex. For the rest of the life of the network, that entity is treated as reused and it never increases the network population.
IiC2 Structural Contribution
Structural Contribution of an ITeM instance is a measure of the growth in the graph size as a result of adding the instance. The Structural Contribution of an independent temporal motif in terms of the number of edges is always equal to the number of temporal edges in the temporal motif. Figure 3 shows a set of temporal motifs and their structural contributions. As shown in Figure 3, every instance of adds three new temporal edges to an existing network. The structural contribution in terms of the number of vertices is impossible to measure using static atomic motif because an atomic motif instance fails to distinguish between the introduction of a new vertex to the network and reusing an existing vertex. Temporal motifs are required to encode this information to model the size and structure of the graph as it evolves. As shown in Figure 3, every instance of the temporal motif adds only one new vertex to an existing network. Whereas, every instance of the temporal motif adds three new vertices to the temporal network.
IiC3 Motif Orbit
An orbit of a motif is defined as distinct positions in which a vertex can appear within the motif. An motif has distinct positions. The orbit of a vertex in a motif encapsulates its functional role in the motif.
As shown in Figure 1, has just one orbit but has three different orbits. Similarity, star motifs and have two orbits each. A combination of structural contribution and a change in the orbit of vertices allow us to model the evolution of a network without measuring the frequency of every automorphic instance. Graph automorphism is a measure of the symmetry of a structure. It is defined as a mapping from the vertices of a given graph to itself.
Iii Approach
Iiia Exact algorithm to count ITeM frequency
In this section we present an exact algorithm to count ITeM frequency. We also present an approximate algorithm using Importance sampling.
Finding matches to temporal motifs is proved to be an NPComplete problem [17]. We use Luby’s Algorithm [18] to discover ITeMs which provides a lower bound on the ITeM frequency.
Algorithms 1 and 2 present the pseudocode to find independent temporal motif instances in a given temporal graph. Algorithm 1 inputs a set of overlapping temporal motif instances and returns ITeM instances. We use GraphFrame [7] to discover the overlapping temporal motif instances. Overlapping motif discovery is a runtime bottleneck and GraphFrame provides optimized motif discovery using graphaware dynamic programming algorithms. It also provides a simple DomainSpecific Language (DSL) to express all the temporal motifs. We use temporal ordering of the edges to define , a lexical representation of the motif instance. It is used as a vertex label to construct a motif overlap graph . The motif overlap graph is an abstract graph that represents clusters of motif instances sharing at least one edge in the input graph as defined in Definition 1. Lines 26 map an edge and its associated set of motif instances. Lines 810 create a set of vertices in the abstract graph. Lines 1216 construct an edgelist using all the motifs that share a temporal edge in the input graph. is constructed by creating an edge in the abstract graph for every shared edge in the input graph . and are used to construct the abstract graph on Line 18. The final result is computed using Algorithm 2 on Line 20, which uses a distributed MIS implementation to compute the ITeM instances.
Algorithm 2 presents the pseudocode of a distributed implementation of the MIS algorithm. We use Pregel API, available in Apache Spark, to implement Luby’s Algorithm [18]. We initialize all vertices in their own independent set as shown in lines 24. At lines 59 of Algorithm 2, each vertex exchanges messages with its neighbors and updates its independent set value based on the minimum values received from all neighbors. This process stops when no vertex in the graph changes its independent set.
IiiB Approximate algorithm to count ITeM frequency
Our approach includes three major algorithmic components: searching for overlapping temporal atomic motifs, finding independent temporal motifs, and computing information content and temporal evolution of such motifs. Out of the three components, finding independent temporal motifs is an NPComplete problem, and we use a heuristic to find a lower bound of the actual count. As explained in the previous section, we construct a motif overlap graph where every vertex is a motif instance and an edge between two vertices exists if the corresponding motif instances share an edge in the original temporal graph . This abstract formulation may lead to a highlycliqued abstract graph, which is a characteristic of various realworld domains, such as a social network. A highlycliqued abstract graph leads to excessive messagepassing in the distributed computing environment. To address this, we use an importance based sampling approach to approximate the motif frequency computation.
Importance sampling for motifs is presented by Liu et al. [17]. The basic approach [22] is to split the time series dataset into multiple temporal windows and perform exact computation on each window. Each window is also assigned an importance, which is used to normalize the computed metric across all randomlyselected windows. We create window graphs with equal temporal window size, each with a different number of edges within the window. We compute the distribution of all temporal motifs present in the window graph. At the end of all the windows, we compute the weighted average of all the distributions, which gives an approximate distribution for the entire graph. Liu et al. show that the weighted average using importance sampling is a lower bound estimate of the distribution.
For a given temporal graph with windows, the importance vector ImpAll is an ordered sequence of window importance ImpI: where the is defined as: where is the number of edges in a window i and is the total number of edges in the temporal graph. For a given motif , the expected motif frequency Fm in the temporal graph can be computed from the exact frequency of the motif in the window with importance as:
We also define a random variable
that selects a specific window in the entire population. The expected frequency is computed as :where is the number of windows selected () for the motif frequency computation. The independent temporal motif distribution F for a given temporal graph is the distribution of all such temporal motifs over the window population. where K is the total number of motifs.
Independence: We also define Independence of a temporal motif as a measure of its uniqueness in a given temporal graph. The independence can be measured for temporal motifs, temporal edges, or vertices of the temporal graph. The edgedisjoint concept defined in section IIC leads to maximal independent temporal edges because every edge has a bijection to the set of independent temporal motifs. We define the independence of a temporal motif and a vertex as follows:

Motif Independence:
Definition 3.
Motif Independence: For a given temporal motif , the independence of the motif is defined as a ratio of the number of ITeM instances to the number of overlapping motif instances.
where is the total number of ITeM instances, and is the total number of motif instances ().
This frequencybased metric identifies unique temporal motifs in the graph. Highly independent motifs exhibit the lower average cost of finding isomorphic combinatorial instances because of their uniqueness.

Vertex Independence:
Definition 4.
Vertex Independence: For a given temporal motif , independence of the involved vertices is defined as a ratio of the number of unique vertices in ITeM instances to the maximum number of vertices possible in those instances.
where is the number of unique vertices in the ITeM instances of the motif, is the total number of motif instances, and is the number of vertices in the motif.
Temporal motifs with high vertex independence lead to high structural contribution defined in section IIB, whereas low vertex independence leads to colocated independent temporal motifs with a higher number of shared vertices among them.
Iv Experiments
To evaluate the performance, interpretability, and scalability of our approach, we analyzed a rich set of synthetic and realworld temporal datasets. The experiment provides support for our following core contributions:

ITeMs are a novel way of capturing discerning temporal properties of a temporal network that cannot be measured using static motifs.

Our approach is scalable and configurable to analyze a temporal network as one large graph or a sequence of windows using sampling.
All the experiments are done on a cluster using Apache Spark 2.3.0 and GraphFrame 0.7.0. All the algorithms are implemented in Scala 2.11.8, and the source code has been opensource at
https://github.com/temporalgraphs/STM.Iva Results on Synthetic Networks
ITeMs can efficiently model the evolution of a temporal network using the properties defined in the section above. To present the accuracy of modeling temporal changes in the network using ITeMs, we generate a set of synthetic temporal graphs using a stochastic generation method and measure the change in the similarity as the networks evolve. We benchmark against Motif and DG and show that ITeMs are better at measuring the changes in the similarity as the networks evolve. For a given population size = 100, we create a temporal graph of oneday time duration, where every vertex creates an edge with a random target vertex with a low probability
at every second. Then, we create variations of the base graph using Gaussian distribution with zero mean and 1/6 day as standard deviation. We create thirty such variations (
) by stretching it one day at a time. For example, the time between edge arrivals in is 10 days longer than in . All the graphs in the sequence have the same structure and only the edge timestamps vary. Figure 4 shows the rate of the addition of temporal edges to the graph. We also show a zoomedin version (right) of and to visualize linearity in the temporal stretch as we increase the total time of the graph. We compute motif frequencies using both algorithms. Similarly, we also compute temporal, structural, and orbital features using our ITeM approach. These feature vectors (i.e., embeddings) are used to measure the pairwise similarity of the temporal networks.Figure 5 shows the change in normalized graph similarity as a function of the difference in the time duration of the synthetic graphs. A point (i,j) on the plot represents the average Euclidean distance j over all the graphs that are i days apart. The Motif allows the use of arbitrarily large values (the limit on the time window spanned by motifs), and we use this feature to identify motifs without any temporal restriction on the time difference between any two motif edges. Figure 5 (left) shows that the temporalspatialorbital features computed by ITeM outperform graph similarity accuracy using Motif features that are based only on motif counts. The SNAP Motif does not capture the temporal variations of discovered motif instances, whereas ITeM can successfully measure it as the graph is stretched in time and the average time between edges and the time to form a motif increases. For maximum distant graphs such as and , we observe an unexpected sharp change in the similarity using SNAP. This requires a deeper analysis of the algorithm and the output generated by the tool.
DG also characterizes a temporal network in terms of graphlet count for the entire network and individual nodes. DG distinguishes graphlet from motif as induced subgraphs that are not defined based on the statistical significance of the substructure, such as in the case of motifs. DG defines orbit in a graphlet to measure automorphism in the graphlet. DG also provides a parameter to restrict time difference between two edges of the graphlet, but due to outofmemory errors, we could not run it in the unbounded setup that was used in the previous experiment. To benchmark against DG, we used a restrictive mode of our algorithm with set to 600 seconds.
Figure 5 (right) shows the result comparing DG and ITeM. As shown in the Figure 4, the base graph shifts from a stochastic base model to a Gaussian distribution based temporal network, which explains the initial sharp increase in the graph distance measured by both algorithms. Both the approaches also show sublinear trends afterward but only ITeM continues as the time difference between graphs increases. DG shows sudden exponential changes in the distance (or similarity) that do not correspond to the linear temporal evolution of the graphs as shown in Figure 4 (right). Overall, both the approaches exhibit similar trends that show the importance of modeling temporal variations and orbital information of the graph, in addition to the frequency count.
Time  

CM  1,899  59,835  20,296  193 days 
BA  3,783  24,186  24,186  1,901 days 
EE  986  332,334  24,929  803 days 
TT  34,800  171,403  155,507  21 hours 
IA  545,196  1,302,439  1,302,253  1,153 days 
HT  304,691  563,069  522,618  7 days 
RH  55,863  571,927  561,483  3 years 4 months 
IvB Results on RealWorld Networks
We analyze various realworld networks and measure the difference in their temporal evolution. The following list introduces all the datasets used for the experiments. Table II describes their static and temporal scale. We generate temporalspatialorbital feature distribution using ITeM frequencies and use it for the measurement. We also use the change in the distribution over time to detect an event in the network.

CollegeMsg (CM): CollegeMsg [23] is comprised of private messages sent on an online social network at the University of California, Irvine. An edge (u, v, t) means that user u sent a private message to user v at time t.

BitcoinAlpha (BA): BitcoinAlpha [14] is a whotrustswhom network of people who trade using Bitcoin on a platform called Bitcoin Alpha. An edge (u, v, t) in the network exists if person u gives a rating to person v at time t.

TechAsTopology (TT): TechAsTopology [26] is a temporal network of Autonomous Systems (AS) where an edge (u, v, t) represents a link between AS u and AS v at time t.

IAStackexch (IA): IAStackexchUserMarksPost is a bipartite Stack Overflow favorite network [26]. Nodes represent users and posts. An edge (u, v, t) denotes that a user u has marked a post v as a favorite at time t.

Higgs Twitter: (HT) The Higgs dataset [8] is an anonymized network that has information about messages posted on Twitter between the 1st and the 7th of July 2012 about the announcement of the discovery of Higgs boson particle. An edge (u, v, t) represents a Twitter interaction between user u and v at time t. An interaction can be a retweet, mention, or reply.

Reddit Hyperlink Network (RH): The Reddit hyperlink network represents the directed connections between two subreddits. It is extracted from the posts that create hyperlinks from one subreddit to another [13]. An edge (u, v, t) represents a hyperlink from subreddit u to subreddit v at time t.
Figure 6 shows the independent temporal motif distribution of different datasets. Similarly, Figure 7 shows motif independence and vertex independence for the datasets. These results give initial clues that similar domain networks such as CM and EE exhibit similar motif and vertex independence, whereas IA has a different distribution.
ITeM can also model the temporal evolution of a network using a sequence of temporal graphs, each with a given time window. We use the Higgs Twitter dataset and monitor 3hour windows from July 1st to July 7th. Our approach iteratively analyzes each window and updates the temporal summary of the network as it progresses. This allows us to not only analyze a large graph using multiple smaller graphs but also to identify an anomalous event in the network and to understand how the behavior of vertices changes in the temporal network. Figure 8 shows a change in ITeM frequencies to reflect a burst event in the graph. The ITeM frequencies peak at the event on July 4th and then gradually return to a normal state. ITeM also provides more insight into the event than basic graph densitybased measures. As shown in Figure 8, the maximum increase is observed in the fringe part of the network, such as selfloops, isolated edges, and residual edges. Similarly, a higher number of stars and wedges are also observed. These observations correspond to a network growth phenomenon where a burst of new interactions occurs in the network among newlyadded entities. In the case of Higgs Twitter, this is explained by a higher number of Twitter users tweeting about the Higgs boson partition discovery, and new hashtags being generated for a short period of time.
Figure 9 shows motif independence over time for the same window of the Higgs Twitter dataset. Figures 8 and 9 show that the core motif, such as the star, increases in count but the motif independence decreases sharply. This happens as the temporal network exhibits the emergence of a hublike structure with a small number of extremelyhigh degree vertices.
IvC Scalability Analysis
A major contribution of this paper is a distributed algorithm to analyze a large temporal graph or a sequence of temporal graph windows. All the algorithms are developed using the Apache Spark 2.3.0, GraphFrame 0.7.0, and Scala 2.11.8 environment. This allows the use of scalable distributed data structures to handle large graphs in the order of millions of edges and to iteratively update the temporalstructural and orbital properties of the graph. To analyze the scalability of the core algorithm, we use a Snakemake [11] based automation pipeline and a SLURM [34] based resource manager. We experiment with different combinations of hardware resources and distributed partitions. Figure 10 shows the results of the scalability experiment using the EmailEU dataset. ITeM shows initial speedup up to a maximum of 32 cores available to the Spark application. Beyond this point, the application suffers from communication and data serialization overhead. A similar trend was observed as we increased the number of data partitions, keeping the maximum number of cores fixed. The runtime sharply decreases as we increase the executor memory from 2GB to 6GB, and the decrease slows down after that.
Temporal analysis of an evolving network using a windowbased approach poses memory constraints and scalability challenges as the number of windows increases. We preserve minimum information across the windows to maintain a global summary of the temporal network and to save windowspecific summaries and vertex features to files, to be used by other analytic processes. This allows us to use our method in a longer running streaming fashion. Although we do not observe a strong sublinear trend as the windows progress, as shown in Figure 11, further analysis of the window graph structure using ITeM suggests that the run times depend on both the window size and the fringe structure of the graph. The runtime of Window 5 and 10 decreases even as the graph size increases because those windows have a higher number of multiedges in comparison to the windows of similar size, which leads to aggressive subgraph reduction while discovering larger motifs. Future work will perform a more detailed analysis of the impact of a specific ITeM count on the runtime.
V Conclusion and Future Work
Complex temporal networks are observed in the real world, and a better understanding of them is required to effectively handle realworld applications. We present Independent Temporal Motif (ITeM) as a building block to characterize temporal graphs. ITeM reveals many salient features of the temporal graph, such as its core structure, fringe vertices and edges, temporal evolution, and uniqueness. Graphs from different domains are found to exhibit varied structural and temporal distributions. Likewise, graphs from similar domains are found to exhibit similar structural properties, but many of them show varied temporal characteristics. We use these observations to characterize individual graphs and define a metric to quantitatively measure the similarity among them. We also present the importance sampling based approach to analyze a large graph as a sequence of smaller windows. We use this to show a change in the distribution that exhibits a behavioral shift in the way entities interact in a transactional graph, such as a social network. The behavioral shift is indicative of saturation in the graph growth, or a specific event that perturbs the usual motif distribution.
The rate at which temporal motifs are formed can also be used to generate synthetic graphs that exhibit similar evolution as a given realworld graph, as shown in [25]. Additionally, these features can also be used in a diverse set of applications, such as approximate subgraph matching, graph mining, and network embedding learning. We will compare ITeM to other temporal network embeddings to measure the benefits of ITeM over other approaches for use in such applications. Future work will also address scalability challenges by estimating the number of independent temporal motifs using different approximation approaches. We will explore specialized algorithms for different motif classes and perform a sensitivity analysis of the sampling approach. We also plan to identify significant temporal motifs in a multitype temporal graph.
Acknowledgment
We thank the DARPA Modeling Adversarial Activity (MAA) program for funding this project under contracts HR0011728117, HR001178235, and HR0011729374. The associated PNNL project number is 69986. A portion of the research was performed using PNNL Institutional Computing (PIC) at Pacific Northwest National Laboratory. We also thank Patrick Mackey and Joseph Cottam for providing feedback and help setting up the experimentation.
References
 [1] (2018) Graphletorbit transitions (got): a fingerprint for temporal network comparison. PloS one 13 (10), pp. e0205497. Cited by: §I.
 [2] (2016) Higherorder organization of complex networks. Science 353 (6295), pp. 163–166. Cited by: §IIB.
 [3] (2019) Motifbased functional backbone extraction of complex networks. Physica A: Statistical Mechanics and its Applications, pp. 121123. Cited by: §I.
 [4] (2017) Complex networks theory for modern smart grid applications: a survey. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 7 (2), pp. 177–191. Cited by: §I.
 [5] (2018) Multichannel large network simulation including adversarial activity. In 2018 IEEE International Conference on Big Data (Big Data), pp. 3947–3950. Cited by: §I.
 [6] (2015) A critical review of robustness in power grids using complex networks concepts. Energies 8 (9), pp. 9211–9265. Cited by: §IIA, §IIA.
 [7] (2016) Graphframes: an integrated api for mixing graph and relational queries. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, pp. 2. Cited by: §IIIA.
 [8] (2013) The anatomy of a scientific rumor. Scientific reports 3, pp. 2980. Cited by: 6th item.
 [9] (2015) Exploring the structure and function of temporal networks with dynamic graphlets. Bioinformatics 31 (12), pp. i171–i180. Cited by: 2nd item.
 [10] (2017) The origin of motif families in food webs. Scientific reports 7 (1), pp. 16197. Cited by: §I.
 [11] (2012) Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28 (19), pp. 2520–2522. Cited by: §IVC.
 [12] (2010) Structure and evolution of online social networks. In Link mining: models, algorithms, and applications, pp. 337–357. Cited by: §I.
 [13] (2018) Community interaction and conflict on the web. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp. 933–943. Cited by: 7th item.
 [14] (2016) Edge weight prediction in weighted signed networks. In Data Mining (ICDM), 2016 IEEE 16th International Conference on, pp. 221–230. Cited by: 2nd item.
 [15] (2019) An analysis of emotionexchange motifs in multiplex networks during emergency events. Applied Network Science 4 (1), pp. 8. Cited by: §I.
 [16] (2007) Graph evolution: densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD) 1 (1), pp. 2. Cited by: 3rd item.
 [17] (2018) A sampling framework for counting temporal motifs. arXiv preprint arXiv:1810.00980. Cited by: §IIC, §II, §IIIA, §IIIB.
 [18] (1986) A simple parallel algorithm for the maximal independent set problem. SIAM journal on computing 15 (4), pp. 1036–1053. Cited by: §II, §IIIA, §IIIA.
 [19] (2016) A guidance to temporal networks. World Scientific. Cited by: §II.
 [20] (2016) An introduction to temporal graphs: an algorithmic perspective. Internet Mathematics 12 (4), pp. 239–280. Cited by: §I.
 [21] (2002) Network motifs: simple building blocks of complex networks. Science 298 (5594), pp. 824–827. Cited by: §I, §I.
 [22] (2013) Monte carlo theory, methods and examples. Monte Carlo Theory, Methods and Examples. Art Owen. Cited by: §IIIB.
 [23] (2009) Patterns and dynamics of users’ behavior and interaction: network analysis of an online community. Journal of the Association for Information Science and Technology 60 (5), pp. 911–932. Cited by: 1st item.
 [24] (2017) Motifs in temporal networks. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 601–610. Cited by: §I, §IIC, 2nd item.
 [25] (2018) Temporal graph generation based on a distribution of temporal motifs. In Proceedings of the 14th International Workshop on Mining and Learning with Graphs, Cited by: §V.
 [26] (2015) The network data repository with interactive graph analytics and visualization. In AAAI, External Links: Link Cited by: 4th item, 5th item.
 [27] (2019) Understanding information flow in cascades using network motifs. arXiv preprint arXiv:1904.05161. Cited by: §I.
 [28] (2005) Frequency concepts and pattern detection for the analysis of motifs in networks. In Transactions on computational systems biology III, pp. 89–104. Cited by: §IIC.
 [29] (2018) Highorder organization of weighted microbial interaction network. In 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 206–209. Cited by: §I, §I.
 [30] (2018) Multiplex network motifs as building blocks of corporate networks. Applied network science 3 (1), pp. 39. Cited by: §I.
 [31] Complex networks. ScienceBased Prediction, pp. 94. Cited by: §I.
 [32] (2004) The topological relationship between the largescale attributes and local interaction patterns of complex networks. Proceedings of the National Academy of Sciences 101 (52), pp. 17940–17945. Cited by: §I.
 [33] (2017) Local higherorder graph clustering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 555–564. Cited by: 3rd item.
 [34] (2003) Slurm: simple linux utility for resource management. In Workshop on Job Scheduling Strategies for Parallel Processing, pp. 44–60. Cited by: §IVC.
Comments
There are no comments yet.