ITeM: Independent Temporal Motifs to Summarize and Compare Temporal Networks

02/19/2020 ∙ by Sumit Purohit, et al. ∙ Washington State University PNNL 0

Networks are a fundamental and flexible way of representing various complex systems. Many domains such as communication, citation, procurement, biology, social media, and transportation can be modeled as a set of entities and their relationships. Temporal networks are a specialization of general networks where the temporal evolution of the system is as important to understand as the structure of the entities and relationships. We present the Independent Temporal Motif (ITeM) to characterize temporal graphs from different domains. The ITeMs are edge-disjoint temporal motifs that can be used to model the structure and the evolution of the graph. For a given temporal graph, we produce a feature vector of ITeM frequencies and apply this distribution to the task of measuring the similarity of temporal graphs. We show that ITeM has higher accuracy than other motif frequency-based approaches. We define various metrics based on ITeM that reveal salient properties of a temporal network. We also present importance sampling as a method for efficiently estimating the ITeM counts. We evaluate our approach on both synthetic and real temporal networks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Networks have been widely used to represent entities, relationships, and behaviors in many real-world domains including power grids [4], social networks [12], microbial interaction networks [29], corporate networks [30], the food web [10], and modeling adversarial activities [5]. These complex systems do not show a temporal or structural continuum, but rather show a characteristic non-linear dynamic behavior [31]. Many salient properties of these systems can be described by different network metrics, measured on a global scale. Count-based metrics such as the number of entities, the number of interactions, and the average connectivity of the entities in the network are important measures that represents the population and the interaction density of the entities involved in the network. However, these measures are limited in their ability to describe non-linear, localized, and dynamic properties of the systems. In order to uncover the structural, temporal, and functional insight of complex systems, network motifs have been used extensively in recent years. Network motifs are patterns of interactions occurring in the complex system at a rate higher than those in a randomized network [21].

The increasing volume, velocity, and variety of temporal networks generated by Big Data applications pose a scalability challenge to the problem of temporal network analysis. Many of the domains generate a constant stream of heterogeneous network channels, and it is not possible to measure and update global network properties in real time. Temporal motifs provide a tractable approximation of the networks that can be measured and updated in Big Data applications within given memory and compute constraints.

Extensive research has been done on the appropriate definition of network motifs and their application to various network analytical tasks. Milo et al. [21]

detect network motifs by finding all possible n-node subgraphs and retaining only those with a higher probability of appearing in a real-world graph with frequency

f than in a randomized network. Such frequency-based approaches reveal interesting properties about the complex network. The motifs shared by ecological food webs are found to be distinct from the motifs shared by genetic networks. Similar motifs are found in networks that perform information processing even though the entities involved are different in those networks [21]. Patterns that are functionally important but not statistically significant are missed by this approach. Vazquez et al. [32] show that the large-scale topological organization of a network and its local subgraph structure mutually define and predict each other. The correlation can be used to understand the properties and evolution of a network.

Cao et al. [3] use network motifs to define the network backbone, which is a collection of relevant nodes and edges in the large-scale network. They define a motif-based extraction method to extract the functional backbone of the complex network. The functional backbone is indicative of certain functional properties of the network that cannot be explained by centrality-based backbones. Similarly, Shen et al. [29] use a weighted motif to cluster microbial interaction networks. Network motifs can also be used to identify the exchange of emotions in online communication networks, such as Twitter [15], using emotion-exchange motifs. The emotion-exchange motifs containing reciprocal edges manifest anger or fear, either in isolation or in any combination with other emotions. Conversely, positive emotions are characteristic of one-way motifs.

A temporal network is a generalization of a static network that changes with time. Many system modeling approaches model time as an attribute of the entity or the interaction, which makes temporal graphs a special case of attributed graphs. We interchangeably use network and graph in this paper. Incorporating time into static graphs has given rise to a new set of important and challenging problems that cannot be modeled as a static graph problem [20]. A majority of the prior research does not account for the temporal evolution of the motif. Recent work [24] defines temporal motif as an elementary unit of the temporal network and provides a general methodology for counting such motifs. It computes the frequency of overlapping temporal motifs, where one interaction can be part of more than one temporal motif. In a temporal motif, all the edges in a given motif have to occur inside the time period of time units. Aparício et al. [1] use orbit transitions to compare a set of temporal networks. Sarkar et al. [27] use the temporal motif to understand information flow in social networks.

We propose the Independent Temporal Motif (ITeM) as the elementary building block of temporal networks. ITeMs are edge-disjoint temporal motifs that provide insight about the temporal evolution of a graph, such as its rate of growth, neighborhood, and the change in the role of a vertex over time. Independence of the temporal motif leads to mutually exclusive motif instances by restricting each edge to participate in only one temporal motif instance. We use an ensemble of the temporal motifs that are simple to compute but at the same time representative of temporal, structural, and functional properties of the network. We also define properties to measure the temporal evolution of the motifs, which informs the rate at which motifs are formed in the network. In contrast to previous work, no limit is put on the time window of the motif, but it can be restricted optionally. We provide algorithms to compute the independent temporal motif distribution of a given graph. We also provide a new distributed implementation using the Apache Spark graph analytic framework.

The rest of the paper is organized as follows. Section II lays out various definitions and section III presents our core approach. Section IV shows our experimentation with synthetic and real-world temporal networks to summarize the temporal networks and measure their similarity. Section V presents conclusions and future work.

Symbol Description
Temporal graph
window
Total number of windows
Total number of motifs
Atomic motif
Temporal motif of atomic Motif
Set of time-steps associated with motif edges
Motif instance
ITeM instance
Number of vertices in motif
Number of unique vertices in ITeM instances of motif
Set of Importance values for each window
Importance of window
Temporal motif distribution for a given temporal graph
Order of a motif
Orbit of a motif
TABLE I: Symbols and their descriptions

Ii Definitions

We present the ITeM-based approach to characterize a temporal network. In the following sections, we present definitions and algorithms used by ITeM to model a temporal network. We also review the Maximum Independent Set (MIS) problem, which is a subproblem of the proposed algorithm. MIS has been proved to be an NP-complete problem, and we present a heuristic-based approach to finding the lower bound on the ITeM frequency

[18]. We also outline a sampling method to estimate the true frequency of a temporal motif in the network. The sampling approach is based on the importance of the sampled network [17].

A temporal graph is a specialization of a static graph, where each edge of the static graph appears at a time unit such as second, day, year, etc. Various representations of temporal graphs that are useful in different scenarios are proposed [19]. We use a window-based representation, where each window corresponds to a temporal sub-graph between two timestamps.

Definition 1.

Temporal Graph: A temporal graph T is an ordered sequence of graphs , indexed by a window id . We define , where and denote the vertex and edge sets, respectively, in the window , arriving since the window . We say the temporal graph is on vertex set and edge set .

This definition allows for the representation of a large graph with a single window. Analyzing a single temporal graph is useful for datasets that are small in size and cover a small period of time.

Ii-a Atomic Motif

Atomic motifs are small subgraphs that serve as interesting indicators for complex networks. They can reveal patterns of association among entities in the network. Figure 1 shows a library of atomic motifs used in the current work. Lower-order motifs such as isolated vertex (order d=1), self-loop (d=1), and isolated edge (d=2) are examples of fringe motifs as they have less (sometimes zero) connectivity to the rest of the network. Whereas, higher-order motifs such as wedge (d=3), triangle (d=3), and square (d=4) are an example of core motifs, which have been found to constitute a major fraction of real-world graphs. Our experimentation shows that the relative frequencies of fringe and core motifs in a temporal network can be used to compute graph similarity.

We can define atomic motifs of any number of vertices and edges, but the larger motifs are more difficult to search for in a network and at the same time, do not substantially increase the quality of actionable information about the network. The search for large atomic motifs suffers from the intractability of the subgraph isomorphism problem and leads to an exponential increase in the runtime. Conversely, smaller atomic motifs are easier to find and yield better dividends in terms of modeling temporal and structural characteristics of the graph.

We limit our motif library to 4-order motifs. The selection of d-order motifs to include in the search library has been influenced by previous research in this area, functional interpretation of the motifs in real-world domains, and computational pragmatism. In addition to the higher-order motifs (d 2), we also make use of a few fringe motifs that provide insight about a complex network that is not captured by such higher-order motifs. and correspond to isolated vertices and isolated edges in the network that are not part of any higher-order motif. An abundance of such motifs is a clear indicator of a sparse, disconnected state of the network and is important to model some domains, such as power-grids [6]. Similarly, and correspond to self-loop and multi-edges between the same set of entities. Frequencies of such motifs show important functional properties of the network and can be used to convert it into a smaller weighted network, where the self-loops and the multi-edges are converted into vertex and edge weights, respectively. At the same time, motifs such as and also contribute to the combinatorial explosion of the higher-order motifs.

Fig. 1: Atomic Motifs

Atomic motifs are small subgraphs that serve as interesting indicators for complex networks. They can reveal patterns of association among entities in the network. Figure 1 shows a library of atomic motifs used in the current work. Lower-order motifs such as isolated vertex (order d=1), self-loop (d=1), and isolated edge (d=2) are examples of fringe motifs as they have less (sometimes zero) connectivity to the rest of the network. Whereas, higher-order motifs such as wedge (d=3), triangle (d=3), and square (d=4) are an example of core motifs, which have been found to constitute a major fraction of real-world graphs. Our experimentation shows that the relative frequencies of fringe and core motifs in a temporal network can be used to compute graph similarity.

We can define atomic motifs of any number of vertices and edges, but the larger motifs are more difficult to search for in a network and at the same time, do not substantially increase the quality of actionable information about the network. The search for large atomic motifs suffers from the intractability of the subgraph isomorphism problem and leads to an exponential increase in the runtime. Conversely, smaller atomic motifs are easier to find and yield better dividends in terms of modeling temporal and structural characteristics of the graph.

We limit our motif library to 4-order motifs. The selection of d-order motifs to include in the search library has been influenced by previous research in this area, functional interpretation of the motifs in real-world domains, and computational pragmatism. In addition to the higher-order motifs (d 2), we also make use of a few fringe motifs that provide insight about a complex network that is not captured by such higher-order motifs. and correspond to isolated vertices and isolated edges in the network that are not part of any higher-order motif. An abundance of such motifs is a clear indicator of a sparse, disconnected state of the network and is important to model some domains, such as power-grids [6]. Similarly, and correspond to self-loop and multi-edges between the same set of entities. Frequencies of such motifs show important functional properties of the network and can be used to convert it into a smaller weighted network, where the self-loops and the multi-edges are converted into vertex and edge weights, respectively. At the same time, motifs such as and also contribute to the combinatorial explosion of the higher-order motifs.

Ii-B Temporal Motif

Definition 2.

Temporal Motif: A Temporal Motif is a graph where:

  • is a set of vertices of the motif.

  • is a set of edges e E, e: where is a set of time steps associated with motif edges.

  • Edges have a temporal ordering such that for an edge : and : if then arrives before .

A Temporal Motif is a specialization of the atomic motif, where every interaction between two vertices occurs at a specific time-step. The time-step of an edge defines a temporal ordering of the edge within the temporal motif . However, it does not correspond to the actual time of the interaction in the temporal graph. Using this definition, we extend the atomic motif to model its temporal evolution in terms of size and structure. Characterization of the temporal network using a set of static motifs can be misleading and inaccurate because the static motifs fail to capture the temporal properties of the network, such as the scale at which transactions occur [2], burstiness of the transactions, and temporal dependency among the set of transactions. Additionally, many temporal systems are characterized as a dense multi-graph, where a pair of entities share many temporal transactions as the network evolves. This poses additional combinatorial complexity challenges beyond discovering structural motifs in the network. Figure 3 shows a set of temporal motifs used in this work.

Fig. 2: Example Input Graph and ITeMs

Ii-C Independent Temporal Motif (ITeM)

Schreiber and Schwobbermeyer [28] describe three different ways to measure the frequency of any pattern in a graph. They categorize them as , , and concepts. In the context of motif computation, includes every occurrence of a motif instance without any restriction, such as reusing a vertex or an edge while computing the frequency of motif instances. Paranjape et al. [24] use this definition to compute overlapping -motif frequencies. and concepts put restrictions on the reuse of a vertex or edge. is an edge-disjoint concept and does not allow the reuse of an edge in more than one instance of the motif. Similarly, is more restrictive as it is a vertex and edge-disjoint concept and does not allow reuse of any vertex and edge in more than one instance of the motif.

A major contribution of our work is the ITeM, which is an edge-disjoint temporal motif such that no two motif instances share any edge between them. It is different than the temporal network modeling approaches mentioned in the related work, which use overlapping motif instances where some instances of a motif can share any number of edges. This restriction poses a greater complexity issue as finding temporal motifs is proved to be an NP-Complete problem [17]. In the following sub-sections, we define some key concepts used by ITeM to model a temporal network.

Fig. 3: Temporal Motifs

Ii-C1 Vertex Birth-Time

We define the birth-time of a vertex in the temporal network as the time of the first transaction involving the vertex. The birth of a vertex increases the network size by one vertex. For the rest of the life of the network, that entity is treated as reused and it never increases the network population.

Ii-C2 Structural Contribution

Structural Contribution of an ITeM instance is a measure of the growth in the graph size as a result of adding the instance. The Structural Contribution of an independent temporal motif in terms of the number of edges is always equal to the number of temporal edges in the temporal motif. Figure 3 shows a set of temporal motifs and their structural contributions. As shown in Figure 3, every instance of adds three new temporal edges to an existing network. The structural contribution in terms of the number of vertices is impossible to measure using static atomic motif because an atomic motif instance fails to distinguish between the introduction of a new vertex to the network and reusing an existing vertex. Temporal motifs are required to encode this information to model the size and structure of the graph as it evolves. As shown in Figure 3, every instance of the temporal motif adds only one new vertex to an existing network. Whereas, every instance of the temporal motif adds three new vertices to the temporal network.

Ii-C3 Motif Orbit

An orbit of a motif is defined as distinct positions in which a vertex can appear within the motif. An motif has distinct positions. The orbit of a vertex in a motif encapsulates its functional role in the motif.

As shown in Figure 1, has just one orbit but has three different orbits. Similarity, star motifs and have two orbits each. A combination of structural contribution and a change in the orbit of vertices allow us to model the evolution of a network without measuring the frequency of every automorphic instance. Graph automorphism is a measure of the symmetry of a structure. It is defined as a mapping from the vertices of a given graph to itself.

Iii Approach

Iii-a Exact algorithm to count ITeM frequency

In this section we present an exact algorithm to count ITeM frequency. We also present an approximate algorithm using Importance sampling.

Data: All motif instances
Result: Independent motif instances
/* Create a mapping between an edge and all associated motif instances. is the string represenation of a motif instance i. */
1 foreach i  do
2        foreach e i do
3              
4        end foreach
5       
6 end foreach
/* For every motif instance label , create a vertex in the overlap graph. */
7 foreach  M do
8       
9 end foreach
/* Create an edge in the overlap graph, between every motif-instance label pair that share an edge in the input graph. */
10 foreach e EM do
11        foreach  ( EM(e) do
12              
13        end foreach
14       
15 end foreach
/* Create the motif overlap graph */
16
/* Find non-overlapping temporal instances */
return
Algorithm 1 getITeM()
Data: An undirected abstract graph
Result: Maximum Independent set of vertices
/* Set every vertex in its own Independent Set */
1 foreach  do
2        =
3 end foreach
4repeat
5        send to every
6        receive for every
7        update by the lowest received
8until  does not change;
/* Get Independent Set as unique values of */
return
Algorithm 2 MaxIndSet()

Finding matches to temporal motifs is proved to be an NP-Complete problem [17]. We use Luby’s Algorithm [18] to discover ITeMs which provides a lower bound on the ITeM frequency.

Algorithms 1 and 2 present the pseudo-code to find independent temporal motif instances in a given temporal graph. Algorithm 1 inputs a set of overlapping temporal motif instances and returns ITeM instances. We use GraphFrame [7] to discover the overlapping temporal motif instances. Overlapping motif discovery is a run-time bottleneck and GraphFrame provides optimized motif discovery using graph-aware dynamic programming algorithms. It also provides a simple Domain-Specific Language (DSL) to express all the temporal motifs. We use temporal ordering of the edges to define , a lexical representation of the motif instance. It is used as a vertex label to construct a motif overlap graph . The motif overlap graph is an abstract graph that represents clusters of motif instances sharing at least one edge in the input graph as defined in Definition 1. Lines 2-6 map an edge and its associated set of motif instances. Lines 8-10 create a set of vertices in the abstract graph. Lines 12-16 construct an edge-list using all the motifs that share a temporal edge in the input graph. is constructed by creating an edge in the abstract graph for every shared edge in the input graph . and are used to construct the abstract graph on Line 18. The final result is computed using Algorithm 2 on Line 20, which uses a distributed MIS implementation to compute the ITeM instances.

Algorithm 2 presents the pseudo-code of a distributed implementation of the MIS algorithm. We use Pregel API, available in Apache Spark, to implement Luby’s Algorithm [18]. We initialize all vertices in their own independent set as shown in lines 2-4. At lines 5-9 of Algorithm 2, each vertex exchanges messages with its neighbors and updates its independent set value based on the minimum values received from all neighbors. This process stops when no vertex in the graph changes its independent set.

Iii-B Approximate algorithm to count ITeM frequency

Our approach includes three major algorithmic components: searching for overlapping temporal atomic motifs, finding independent temporal motifs, and computing information content and temporal evolution of such motifs. Out of the three components, finding independent temporal motifs is an NP-Complete problem, and we use a heuristic to find a lower bound of the actual count. As explained in the previous section, we construct a motif overlap graph where every vertex is a motif instance and an edge between two vertices exists if the corresponding motif instances share an edge in the original temporal graph . This abstract formulation may lead to a highly-cliqued abstract graph, which is a characteristic of various real-world domains, such as a social network. A highly-cliqued abstract graph leads to excessive message-passing in the distributed computing environment. To address this, we use an importance based sampling approach to approximate the motif frequency computation.

Importance sampling for motifs is presented by Liu et al. [17]. The basic approach [22] is to split the time series dataset into multiple temporal windows and perform exact computation on each window. Each window is also assigned an importance, which is used to normalize the computed metric across all randomly-selected windows. We create window graphs with equal temporal window size, each with a different number of edges within the window. We compute the distribution of all temporal motifs present in the window graph. At the end of all the windows, we compute the weighted average of all the distributions, which gives an approximate distribution for the entire graph. Liu et al. show that the weighted average using importance sampling is a lower bound estimate of the distribution.

For a given temporal graph with windows, the importance vector ImpAll is an ordered sequence of window importance ImpI: where the is defined as: where is the number of edges in a window i and is the total number of edges in the temporal graph. For a given motif , the expected motif frequency Fm in the temporal graph can be computed from the exact frequency of the motif in the window with importance as:

We also define a random variable

that selects a specific window in the entire population. The expected frequency is computed as :

where is the number of windows selected () for the motif frequency computation. The independent temporal motif distribution F for a given temporal graph is the distribution of all such temporal motifs over the window population. where K is the total number of motifs.

Independence: We also define Independence of a temporal motif as a measure of its uniqueness in a given temporal graph. The independence can be measured for temporal motifs, temporal edges, or vertices of the temporal graph. The edge-disjoint concept defined in section II-C leads to maximal independent temporal edges because every edge has a bijection to the set of independent temporal motifs. We define the independence of a temporal motif and a vertex as follows:

  1. Motif Independence:

    Definition 3.

    Motif Independence: For a given temporal motif , the independence of the motif is defined as a ratio of the number of ITeM instances to the number of overlapping motif instances.

    where is the total number of ITeM instances, and is the total number of motif instances ().

    This frequency-based metric identifies unique temporal motifs in the graph. Highly independent motifs exhibit the lower average cost of finding isomorphic combinatorial instances because of their uniqueness.

  2. Vertex Independence:

    Definition 4.

    Vertex Independence: For a given temporal motif , independence of the involved vertices is defined as a ratio of the number of unique vertices in ITeM instances to the maximum number of vertices possible in those instances.

    where is the number of unique vertices in the ITeM instances of the motif, is the total number of motif instances, and is the number of vertices in the motif.

    Temporal motifs with high vertex independence lead to high structural contribution defined in section II-B, whereas low vertex independence leads to co-located independent temporal motifs with a higher number of shared vertices among them.

Iv Experiments

To evaluate the performance, interpretability, and scalability of our approach, we analyzed a rich set of synthetic and real-world temporal datasets. The experiment provides support for our following core contributions:

  • ITeMs are a novel way of capturing discerning temporal properties of a temporal network that cannot be measured using static motifs.

  • ITeMs outperform the Stanford SNAP temporal motif algorithm [24] (referred as -Motif hereinafter) and Dynamic Graphlet (DG) [9] in measuring the similarity of temporal graphs.

  • Our approach is scalable and configurable to analyze a temporal network as one large graph or a sequence of windows using sampling.

All the experiments are done on a cluster using Apache Spark 2.3.0 and GraphFrame 0.7.0. All the algorithms are implemented in Scala 2.11.8, and the source code has been open-source at

https://github.com/temporal-graphs/STM.

Iv-a Results on Synthetic Networks

ITeMs can efficiently model the evolution of a temporal network using the properties defined in the section above. To present the accuracy of modeling temporal changes in the network using ITeMs, we generate a set of synthetic temporal graphs using a stochastic generation method and measure the change in the similarity as the networks evolve. We benchmark against -Motif and DG and show that ITeMs are better at measuring the changes in the similarity as the networks evolve. For a given population size = 100, we create a temporal graph of one-day time duration, where every vertex creates an edge with a random target vertex with a low probability

at every second. Then, we create variations of the base graph using Gaussian distribution with zero mean and 1/6 day as standard deviation. We create thirty such variations (

) by stretching it one day at a time. For example, the time between edge arrivals in is 10 days longer than in . All the graphs in the sequence have the same structure and only the edge timestamps vary. Figure 4 shows the rate of the addition of temporal edges to the graph. We also show a zoomed-in version (right) of and to visualize linearity in the temporal stretch as we increase the total time of the graph. We compute motif frequencies using both algorithms. Similarly, we also compute temporal, structural, and orbital features using our ITeM approach. These feature vectors (i.e., embeddings) are used to measure the pairwise similarity of the temporal networks.

Fig. 4: Synthetic Graphs
Fig. 5: Temporal Graph Similarity

Figure 5 shows the change in normalized graph similarity as a function of the difference in the time duration of the synthetic graphs. A point (i,j) on the plot represents the average Euclidean distance j over all the graphs that are i days apart. The -Motif allows the use of arbitrarily large values (the limit on the time window spanned by motifs), and we use this feature to identify motifs without any temporal restriction on the time difference between any two motif edges. Figure 5 (left) shows that the temporal-spatial-orbital features computed by ITeM outperform graph similarity accuracy using -Motif features that are based only on motif counts. The SNAP -Motif does not capture the temporal variations of discovered motif instances, whereas ITeM can successfully measure it as the graph is stretched in time and the average time between edges and the time to form a motif increases. For maximum distant graphs such as and , we observe an unexpected sharp change in the similarity using SNAP. This requires a deeper analysis of the algorithm and the output generated by the tool.

DG also characterizes a temporal network in terms of graphlet count for the entire network and individual nodes. DG distinguishes graphlet from motif as induced subgraphs that are not defined based on the statistical significance of the substructure, such as in the case of motifs. DG defines orbit in a graphlet to measure automorphism in the graphlet. DG also provides a parameter to restrict time difference between two edges of the graphlet, but due to out-of-memory errors, we could not run it in the unbounded setup that was used in the previous experiment. To benchmark against DG, we used a restrictive mode of our algorithm with set to 600 seconds.

Figure 5 (right) shows the result comparing DG and ITeM. As shown in the Figure 4, the base graph shifts from a stochastic base model to a Gaussian distribution based temporal network, which explains the initial sharp increase in the graph distance measured by both algorithms. Both the approaches also show sub-linear trends afterward but only ITeM continues as the time difference between graphs increases. DG shows sudden exponential changes in the distance (or similarity) that do not correspond to the linear temporal evolution of the graphs as shown in Figure 4 (right). Overall, both the approaches exhibit similar trends that show the importance of modeling temporal variations and orbital information of the graph, in addition to the frequency count.

Time
CM 1,899 59,835 20,296 193 days
BA 3,783 24,186 24,186 1,901 days
EE 986 332,334 24,929 803 days
TT 34,800 171,403 155,507 21 hours
IA 545,196 1,302,439 1,302,253 1,153 days
HT 304,691 563,069 522,618 7 days
RH 55,863 571,927 561,483 3 years 4 months
TABLE II: Temporal Graphs Datasets

Iv-B Results on Real-World Networks

We analyze various real-world networks and measure the difference in their temporal evolution. The following list introduces all the datasets used for the experiments. Table II describes their static and temporal scale. We generate temporal-spatial-orbital feature distribution using ITeM frequencies and use it for the measurement. We also use the change in the distribution over time to detect an event in the network.

  • CollegeMsg (CM): CollegeMsg [23] is comprised of private messages sent on an online social network at the University of California, Irvine. An edge (u, v, t) means that user u sent a private message to user v at time t.

  • Bitcoin-Alpha (BA): Bitcoin-Alpha [14] is a who-trusts-whom network of people who trade using Bitcoin on a platform called Bitcoin Alpha. An edge (u, v, t) in the network exists if person u gives a rating to person v at time t.

  • Email-EU (EE): Email-EU [33] [16] is an anonymized network that has information about all incoming and outgoing email between members of a large European research institution. An edge (u, v, t) in the network exists if person u sent an email to person v at time t.

  • Tech-As-Topology (TT): Tech-As-Topology [26] is a temporal network of Autonomous Systems (AS) where an edge (u, v, t) represents a link between AS u and AS v at time t.

  • IA-Stackexch (IA): IA-Stackexch-User-Marks-Post is a bipartite Stack Overflow favorite network [26]. Nodes represent users and posts. An edge (u, v, t) denotes that a user u has marked a post v as a favorite at time t.

  • Higgs Twitter: (HT) The Higgs dataset [8] is an anonymized network that has information about messages posted on Twitter between the 1st and the 7th of July 2012 about the announcement of the discovery of Higgs boson particle. An edge (u, v, t) represents a Twitter interaction between user u and v at time t. An interaction can be a re-tweet, mention, or reply.

  • Reddit Hyperlink Network (RH): The Reddit hyperlink network represents the directed connections between two subreddits. It is extracted from the posts that create hyperlinks from one subreddit to another [13]. An edge (u, v, t) represents a hyperlink from subreddit u to subreddit v at time t.

Figure 6 shows the independent temporal motif distribution of different datasets. Similarly, Figure 7 shows motif independence and vertex independence for the datasets. These results give initial clues that similar domain networks such as CM and EE exhibit similar motif and vertex independence, whereas IA has a different distribution.

ITeM can also model the temporal evolution of a network using a sequence of temporal graphs, each with a given time window. We use the Higgs Twitter dataset and monitor 3-hour windows from July 1st to July 7th. Our approach iteratively analyzes each window and updates the temporal summary of the network as it progresses. This allows us to not only analyze a large graph using multiple smaller graphs but also to identify an anomalous event in the network and to understand how the behavior of vertices changes in the temporal network. Figure 8 shows a change in ITeM frequencies to reflect a burst event in the graph. The ITeM frequencies peak at the event on July 4th and then gradually return to a normal state. ITeM also provides more insight into the event than basic graph density-based measures. As shown in Figure 8, the maximum increase is observed in the fringe part of the network, such as self-loops, isolated edges, and residual edges. Similarly, a higher number of stars and wedges are also observed. These observations correspond to a network growth phenomenon where a burst of new interactions occurs in the network among newly-added entities. In the case of Higgs Twitter, this is explained by a higher number of Twitter users tweeting about the Higgs boson partition discovery, and new hashtags being generated for a short period of time.

Fig. 6: ITeM distribution (log10) of different datasets
Fig. 7: Motif and Vertex Independence of different datasets. x-axis represents motif-id and y-axis represents Motif Independence (left) and Vertex Independence (right)
Fig. 8: ITeM frequency changes in the Higgs Twitter (HT) temporal network
Fig. 9: ITeM Independence changes in the Higgs Twitter (HT) temporal network
Fig. 10: ITeM runtime analysis on Email-EU (EE) dataset: Single Graph
Fig. 11: ITeM runtime analysis on Reddit (RH) dataset: Sequence of Temporal Graphs

Figure 9 shows motif independence over time for the same window of the Higgs Twitter dataset. Figures 8 and 9 show that the core motif, such as the star, increases in count but the motif independence decreases sharply. This happens as the temporal network exhibits the emergence of a hub-like structure with a small number of extremely-high degree vertices.

Iv-C Scalability Analysis

A major contribution of this paper is a distributed algorithm to analyze a large temporal graph or a sequence of temporal graph windows. All the algorithms are developed using the Apache Spark 2.3.0, GraphFrame 0.7.0, and Scala 2.11.8 environment. This allows the use of scalable distributed data structures to handle large graphs in the order of millions of edges and to iteratively update the temporal-structural and orbital properties of the graph. To analyze the scalability of the core algorithm, we use a Snakemake [11] based automation pipeline and a SLURM [34] based resource manager. We experiment with different combinations of hardware resources and distributed partitions. Figure 10 shows the results of the scalability experiment using the EmailEU dataset. ITeM shows initial speed-up up to a maximum of 32 cores available to the Spark application. Beyond this point, the application suffers from communication and data serialization overhead. A similar trend was observed as we increased the number of data partitions, keeping the maximum number of cores fixed. The run-time sharply decreases as we increase the executor memory from 2GB to 6GB, and the decrease slows down after that.

Temporal analysis of an evolving network using a window-based approach poses memory constraints and scalability challenges as the number of windows increases. We preserve minimum information across the windows to maintain a global summary of the temporal network and to save window-specific summaries and vertex features to files, to be used by other analytic processes. This allows us to use our method in a longer running streaming fashion. Although we do not observe a strong sub-linear trend as the windows progress, as shown in Figure 11, further analysis of the window graph structure using ITeM suggests that the run times depend on both the window size and the fringe structure of the graph. The runtime of Window 5 and 10 decreases even as the graph size increases because those windows have a higher number of multi-edges in comparison to the windows of similar size, which leads to aggressive subgraph reduction while discovering larger motifs. Future work will perform a more detailed analysis of the impact of a specific ITeM count on the runtime.

V Conclusion and Future Work

Complex temporal networks are observed in the real world, and a better understanding of them is required to effectively handle real-world applications. We present Independent Temporal Motif (ITeM) as a building block to characterize temporal graphs. ITeM reveals many salient features of the temporal graph, such as its core structure, fringe vertices and edges, temporal evolution, and uniqueness. Graphs from different domains are found to exhibit varied structural and temporal distributions. Likewise, graphs from similar domains are found to exhibit similar structural properties, but many of them show varied temporal characteristics. We use these observations to characterize individual graphs and define a metric to quantitatively measure the similarity among them. We also present the importance sampling based approach to analyze a large graph as a sequence of smaller windows. We use this to show a change in the distribution that exhibits a behavioral shift in the way entities interact in a transactional graph, such as a social network. The behavioral shift is indicative of saturation in the graph growth, or a specific event that perturbs the usual motif distribution.

The rate at which temporal motifs are formed can also be used to generate synthetic graphs that exhibit similar evolution as a given real-world graph, as shown in [25]. Additionally, these features can also be used in a diverse set of applications, such as approximate sub-graph matching, graph mining, and network embedding learning. We will compare ITeM to other temporal network embeddings to measure the benefits of ITeM over other approaches for use in such applications. Future work will also address scalability challenges by estimating the number of independent temporal motifs using different approximation approaches. We will explore specialized algorithms for different motif classes and perform a sensitivity analysis of the sampling approach. We also plan to identify significant temporal motifs in a multi-type temporal graph.

Acknowledgment

We thank the DARPA Modeling Adversarial Activity (MAA) program for funding this project under contracts HR0011728117, HR001178235, and HR0011729374. The associated PNNL project number is 69986. A portion of the research was performed using PNNL Institutional Computing (PIC) at Pacific Northwest National Laboratory. We also thank Patrick Mackey and Joseph Cottam for providing feedback and help setting up the experimentation.

References

  • [1] D. Aparício, P. Ribeiro, and F. Silva (2018) Graphlet-orbit transitions (got): a fingerprint for temporal network comparison. PloS one 13 (10), pp. e0205497. Cited by: §I.
  • [2] A. R. Benson, D. F. Gleich, and J. Leskovec (2016) Higher-order organization of complex networks. Science 353 (6295), pp. 163–166. Cited by: §II-B.
  • [3] J. Cao, C. Ding, and B. Shi (2019) Motif-based functional backbone extraction of complex networks. Physica A: Statistical Mechanics and its Applications, pp. 121123. Cited by: §I.
  • [4] C. Chu and H. H. Iu (2017) Complex networks theory for modern smart grid applications: a survey. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 7 (2), pp. 177–191. Cited by: §I.
  • [5] J. A. Cottam, S. Purohit, P. Mackey, and G. Chin (2018) Multi-channel large network simulation including adversarial activity. In 2018 IEEE International Conference on Big Data (Big Data), pp. 3947–3950. Cited by: §I.
  • [6] L. Cuadra, S. Salcedo-Sanz, J. Del Ser, S. Jiménez-Fernández, and Z. Geem (2015) A critical review of robustness in power grids using complex networks concepts. Energies 8 (9), pp. 9211–9265. Cited by: §II-A, §II-A.
  • [7] A. Dave, A. Jindal, L. E. Li, R. Xin, J. Gonzalez, and M. Zaharia (2016) Graphframes: an integrated api for mixing graph and relational queries. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, pp. 2. Cited by: §III-A.
  • [8] M. De Domenico, A. Lima, P. Mougel, and M. Musolesi (2013) The anatomy of a scientific rumor. Scientific reports 3, pp. 2980. Cited by: 6th item.
  • [9] Y. Hulovatyy, H. Chen, and T. Milenković (2015) Exploring the structure and function of temporal networks with dynamic graphlets. Bioinformatics 31 (12), pp. i171–i180. Cited by: 2nd item.
  • [10] J. Klaise and S. Johnson (2017) The origin of motif families in food webs. Scientific reports 7 (1), pp. 16197. Cited by: §I.
  • [11] J. Köster and S. Rahmann (2012) Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28 (19), pp. 2520–2522. Cited by: §IV-C.
  • [12] R. Kumar, J. Novak, and A. Tomkins (2010) Structure and evolution of online social networks. In Link mining: models, algorithms, and applications, pp. 337–357. Cited by: §I.
  • [13] S. Kumar, W. L. Hamilton, J. Leskovec, and D. Jurafsky (2018) Community interaction and conflict on the web. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp. 933–943. Cited by: 7th item.
  • [14] S. Kumar, F. Spezzano, V. Subrahmanian, and C. Faloutsos (2016) Edge weight prediction in weighted signed networks. In Data Mining (ICDM), 2016 IEEE 16th International Conference on, pp. 221–230. Cited by: 2nd item.
  • [15] E. Kušen and M. Strembeck (2019) An analysis of emotion-exchange motifs in multiplex networks during emergency events. Applied Network Science 4 (1), pp. 8. Cited by: §I.
  • [16] J. Leskovec, J. Kleinberg, and C. Faloutsos (2007) Graph evolution: densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD) 1 (1), pp. 2. Cited by: 3rd item.
  • [17] P. Liu, A. Benson, and M. Charikar (2018) A sampling framework for counting temporal motifs. arXiv preprint arXiv:1810.00980. Cited by: §II-C, §II, §III-A, §III-B.
  • [18] M. Luby (1986) A simple parallel algorithm for the maximal independent set problem. SIAM journal on computing 15 (4), pp. 1036–1053. Cited by: §II, §III-A, §III-A.
  • [19] N. Masuda and R. Lambiotte (2016) A guidance to temporal networks. World Scientific. Cited by: §II.
  • [20] O. Michail (2016) An introduction to temporal graphs: an algorithmic perspective. Internet Mathematics 12 (4), pp. 239–280. Cited by: §I.
  • [21] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon (2002) Network motifs: simple building blocks of complex networks. Science 298 (5594), pp. 824–827. Cited by: §I, §I.
  • [22] A. B. Owen (2013) Monte carlo theory, methods and examples. Monte Carlo Theory, Methods and Examples. Art Owen. Cited by: §III-B.
  • [23] P. Panzarasa, T. Opsahl, and K. M. Carley (2009) Patterns and dynamics of users’ behavior and interaction: network analysis of an online community. Journal of the Association for Information Science and Technology 60 (5), pp. 911–932. Cited by: 1st item.
  • [24] A. Paranjape, A. R. Benson, and J. Leskovec (2017) Motifs in temporal networks. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 601–610. Cited by: §I, §II-C, 2nd item.
  • [25] S. Purohit, L. Holder, and G. Chin (2018) Temporal graph generation based on a distribution of temporal motifs. In Proceedings of the 14th International Workshop on Mining and Learning with Graphs, Cited by: §V.
  • [26] R. A. Rossi and N. K. Ahmed (2015) The network data repository with interactive graph analytics and visualization. In AAAI, External Links: Link Cited by: 4th item, 5th item.
  • [27] S. Sarkar, H. Alvari, and P. Shakarian (2019) Understanding information flow in cascades using network motifs. arXiv preprint arXiv:1904.05161. Cited by: §I.
  • [28] F. Schreiber and H. Schwöbbermeyer (2005) Frequency concepts and pattern detection for the analysis of motifs in networks. In Transactions on computational systems biology III, pp. 89–104. Cited by: §II-C.
  • [29] X. Shen, X. Gong, X. Jiang, J. Yang, T. He, and X. Hu (2018) High-order organization of weighted microbial interaction network. In 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 206–209. Cited by: §I, §I.
  • [30] F. W. Takes, W. A. Kosters, B. Witte, and E. M. Heemskerk (2018) Multiplex network motifs as building blocks of corporate networks. Applied network science 3 (1), pp. 39. Cited by: §I.
  • [31] Z. Toroczkai Complex networks. Science-Based Prediction, pp. 94. Cited by: §I.
  • [32] A. Vazquez, R. Dobrin, D. Sergi, J. Eckmann, Z. Oltvai, and A. Barabási (2004) The topological relationship between the large-scale attributes and local interaction patterns of complex networks. Proceedings of the National Academy of Sciences 101 (52), pp. 17940–17945. Cited by: §I.
  • [33] H. Yin, A. R. Benson, J. Leskovec, and D. F. Gleich (2017) Local higher-order graph clustering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 555–564. Cited by: 3rd item.
  • [34] A. B. Yoo, M. A. Jette, and M. Grondona (2003) Slurm: simple linux utility for resource management. In Workshop on Job Scheduling Strategies for Parallel Processing, pp. 44–60. Cited by: §IV-C.