1 Introduction
Networks provide an abstraction for studying complex systems in a broad set of disciplines, ranging from social and communication networks to molecular biology and neuroscience [20]. Typically, these systems are modeled as static graphs that describe relationships between objects (nodes) and links between the objects (edges). However, many systems are not static as the links between objects dynamically change over time [8]. Such temporal networks can be represented by a series of timestamped edges, or temporal edges. For example, a network of email or instant message communication can be represented as a sequence of timestamped directed edges, one for every message that is sent from one person to another. Similar representations can be used to model computer networks, phone calls, financial transactions, and biological signaling networks.
While such temporal networks are ubiquitous, there are few tools for modeling and characterizing the underlying structure of such dynamical systems. Existing methods either model the networks as strictly growing where a pair of nodes connect once and stay connected forever [2, 10, 17] or aggregate temporal information into a sequence of snapshots [1, 6, 23]. These techniques fail to fully capture the richness of the temporal information in the data.
Characterizing temporal networks also brings a number of interesting challenges that distinguish it from the analysis of static networks. For example, while the number of nodes and pairs of connected nodes can be of manageable size, the number of temporal edges may be very large and thus efficient algorithms are needed when analyzing such data. Another interesting challenge is that patterns in temporal networks can occur at different time scales. For example, in telephone call networks, reciprocation (that is, a person returning a call) can occur on very short time intervals, while more intricate patterns (e.g., person calling person , who then calls ) may occur at larger time scales. Lastly, there are many possible temporal patterns as the order as well as the sequence of edges play an important role.
Present work: Temporal network motifs. Here, we provide a general methodology for analyzing temporal networks. We define temporal networks as a set of nodes and a collection of directed temporal edges, where each edge has a timestamp. For example, Fig. 1
illustrates a small temporal network with nine temporal edges between five ordered pairs of nodes.
Our analytical approach is based on generalizing the notion of network motifs to temporal networks. In static networks, network motifs or graphlets are defined as small induced subgraphs occurring in a bigger network structure [4, 19, 29]. We extend static motifs to temporal networks and define temporal motifs, where all the edges in a given motif have to occur inside the time period of time units. These temporal motifs simultaneously account for ordering of edges and a temporal window in which edges can occur. For example, Fig. 1 shows a motif on three nodes and three edges, where the edge label denotes the order in which the edges appear. While we focus on directed edges with a single timestamp in this work, our methodology seamlessly generalizes to common variations on this model. For example, our methods can incorporate timestamps with durations (common in telephone call networks), colored edges that identify different types of connections, and temporal networks with undirected edges.
We then consider the problem of counting how many times does each temporal motif occur in a given temporal network. We develop a general algorithm for counting temporal network motifs defined by any number of nodes and edges that avoids enumeration over subsets of temporal edges and whose complexity depends on the structure of the static graph induced by the temporal motif. For motifs defined by a constant number of temporal edges between nodes, this general algorithm is optimal up to constant factors—it runs in time, where is the number of temporal edges.
Furthermore, we design fast variations of the algorithm that allow for counting certain classes of temporal motifs including star and triangle patterns. These algorithms are based on a common framework for managing summary counts in specified time windows. For star motifs with nodes and temporal edges, we again achieve a running time linear in the input, i.e., time. Given a temporal graph with induced triangles in its induced static graph, our fast algorithm counts temporal triangle motifs with temporal edges in worstcase time. In contrast, any algorithm that processes triangles individually takes worstcase time. In practice, our fast temporal triangle counting algorithm is up to 56 times faster than a competitive baseline and runs in just a couple of hours on a network with over two billion temporal edges.
Our algorithmic framework enables us to study the structure of several complex systems. For example, we explore the differences in human communication patterns by analyzing motif frequencies in text message, Facebook wall post, email and private online message network datasets. Temporal network motif counts reveal that text messaging and Facebook wall posting are dominated by “blocking” communication, where a user only engages with one other user at a time, whereas email is mostly characterized by “nonblocking” communication as individuals send out several emails in a row. Furthermore, private online messaging contains a mixture of blocking and nonblocking behavior.
Temporal network motifs can also be used to measure the frequency of patterns at different time scales. For example, the difference in temporal motif counts for minutes and minutes counts only the motifs that take at least 30 minutes and at most 60 minutes to form. With this type of analysis, we find that certain questionandanswer patterns on Stack Overflow need at least 30 minutes to develop. We also see that in online private messaging, star patterns constructed by outgoing messages sent by one user tend to increase in frequency from time scales of 1 to 20 minutes before peaking and then declining in frequency.
All in all, our work defines a flexible notion of motifs in temporal networks and provides efficient algorithms for counting them. It enables new analyses in a variety of scientific domains and paves a new way for modeling dynamic complex systems.
2 Related work
Our work builds upon the rich literature on network motifs in static graphs, where these models have proved crucial to understanding the mechanisms driving complex systems [19] and to characterizing classes of static networks [25, 29]. Furthermore, motifs are critical for understanding the higherorder organizational patterns in networks [3, 4]. On the algorithmic side, a large amount of research has been devoted simply to counting triangles in undirected static graphs [14].
Prior definitions of temporal network motifs either do not account for edge ordering [30]
, only have heuristic counting algorithms
[7], or assume temporal edges in a motif must be consecutive events for a node [13]. In the last case, the restrictive definition permits fast counting algorithms but misses important structures. For example, many related edges occurring in a short burst at a node would not be counted together. In contrast, temporal motifs capture every occasion that edges form a particular pattern within the prescribed time window.There are several studies on pattern formation in growing networks where one only considers the addition of edges to a static graph over time. In this context, motiflike patterns have been used to create evolution rules that govern the ways that networks develop [5, 24]. The way we consider ordering of temporal edges in our definition of temporal motifs is similar in spirit. There are also several analyses on the formation of triangles in a variety of social networks [9, 12, 15]. In contrast, in the temporal graphs we study here, three nodes may form a triangle several times.
3 Preliminaries
We now provide formal definitions of temporal graphs and temporal motifs. In Section 4, we provide algorithms for counting the number of temporal motifs in a given temporal graph.
Temporal edges and graphs. We define a temporal edge to be a timestamped directed edge between an ordered pair of nodes. We call a collection of temporal edges a temporal graph (Fig. 1). Formally, a temporal graph on a node set is a collection of tuples , , where each and are elements of and each is a timestamp in . We refer to a specific tuple as a temporal edge. There can be many temporal edges directed from to , and we refer to them as edges between and . We assume that the timestamps are unique so that the tuples may be strictly ordered. This assumption makes the presentation of the definitions and algorithms clearer, but our methods can easily be adapted to the case when timestamps are not unique. When it is clear from context, we refer to a temporal edge as simply an edge. Finally, by ignoring timestamps and duplicate edges, the temporal graph induces a standard directed graph, which we call the static graph of with static edges, i.e., is an edge in if and only if there is some temporal edge in .
temporal motifs and motif instances. We formalize temporal motifs with the following definition.
Definition.
A node, edge, temporal motif is a sequence of edges, that are timeordered within a duration, i.e., and , such that the induced static graph from the edges is connected and has nodes.
Note that with this definition, many edges between the same pair of nodes may occur in the motif . Also, we note that the purpose of the timestamps is to induce an ordering on the edges. Fig. 1 illustrates a particular node, edge temporal motif.
The above definition provides a template for a particular pattern, and we are interested in how many times a given pattern occurs in a dataset. Intuitively, a collection of edges in a given temporal graph is an instance of a temporal motif if it matches the same edge pattern and all of the edges occur in the right order within the time window (Fig. 1). Formally, we say that any timeordered sequence of unique edges is an instance of the motif if

[noitemsep, topsep=2pt]

There exists a bijection on the vertices such that and , , and

the edges all occur within time, i.e.,
A central goal of this work is to count the number of ordered subsets of edges from a temporal graph that are instances of a particular motif. In other words, given a node, edge temporal motif, we seek to find how many of the ordered length sequences of edges in the temporal graph are instances of the motif. A naive approach to this problem would be to simply enumerate all ordered subsets and then check if it is an instance of the motif. In modern datasets, the number of edges is typically quite large (we analyze a dataset in Section 5 with over two billion edges), and this approach is impractical even for . In the following section, we discuss several faster algorithms for counting the number of instances of temporal motifs in a temporal graph.
4 Algorithms
We now present several algorithms for exactly counting the number of instances of temporal motifs in a temporal graph. We first present a general counting algorithm in Section 4.1, which can count instances of any node, edge temporal motif faster than simply enumerating over all size ordered subsets of edges. This algorithm is optimal for counting node temporal motifs in the sense that it is linear in the number of edges in the temporal graph. In Section 4.2, we provide faster, specialized algorithms for counting specific types of node, edge temporal motifs (Fig. 3).
4.1 General counting framework
[tb] KwToinForfor:emph myprocProcedure KwToinForfor:emph incrementIncrementCounts decrementDecrementCounts Sequence of edges with , time window Number of instances of each edge temporal motif contained in the sequence , Counter(default = 0) , counts suffix of length prefix of length
We begin with a general framework for counting the number of instances of a node, edge temporal motif . To start, consider to be the static directed graph induced by the edges of . A sequence of temporal edges is an instance of if and only if the static subgraph induced by edges in is isomorphic to , the ordering of the edges in matches the order in , and all the edges in span a time window of at most time units. This leads to the following general algorithm for counting instances of in a temporal graph :

[noitemsep,topsep=10pt]

Identify all instances of the static motif induced by within the static graph induced by the temporal graph (e.g., there are three instances of induced by in Fig. 1).

For each static motif instance , gather all temporal edges between pairs of nodes forming an edge in into an ordered sequence , , .

Count the number of (potentially noncontiguous) subsequences of edges in occurring within time units that correspond to instances of .
The first step can use known algorithms for enumerating motifs in static graphs [27], and the second step is a simple matter of fetching the appropriate temporal edges. To perform the third step efficiently, we develop a dynamic programming approach for counting the number of subsequences (instances of motif ) that match a particular pattern within a larger sequence (). The key idea is that, as we stream through an input sequence of edges, the count of a given length pattern (i.e., motif) with a given final edge is computed from the current count of the length() prefix of the pattern. Inductively, we maintain auxiliary counters of all of the prefixes of the pattern (motif). Second, we also require that all edges in the motif be at most time apart. Thus, we use the notion of a moving time window such that any two edges in the time window are at most time apart. The auxiliary counters now keep track of only the subsequences occurring within the current time window. Last, it is important to note that the algorithm only counts the number of instances of motifs rather than enumerating them.
Alg. 4.1 counts all possible edge motifs that occur in a given sequence of edges. The data structure maintains auxiliary counts of all (ordered) patterns of length at most . Specifically, is the number of times the subsequence occurs in the current time window (if ) or the number of times the subsequence has occurred within all time windows of length (if ). We also assume the keys of are accessed in order of length. Moving the time window forward by adding a new edge into the window, all edges farther than time from the new edge are removed from the window and the appropriate counts are decremented (the DecrementCounts() method). First, the single edge counts () are updated. Based on these updates, length subsequences formed with as its first edge are updated and so on, up through length() subsequences. On the other hand, when an edge is added to the window, similar updates take place, but in reverse order, from longest to shortest subsequences, in order to increment counts in subsequences where is the last edge (the IncrementCounts() method). Importantly, length subsequence counts are incremented in this step but never decremented. As the time window moves from the beginning to the end of the sequence of edges, the algorithm accumulates counts of all length subsequences in all possible time windows of length .
Fig. 2 shows the execution of the Alg. 4.1 for a particular sequence of edges. Note that the figure only displays values of for contiguous subsequences of the motif , but the algorithm keeps counts for other subsequences as well. In general, there are contiguous subsequences of an edge motif , and there are total keys in , where is the number of edges in the static subgraph induced by , in order to count all edge motifs in the sequence (i.e., not just motif ).
We now analyze the complexity of the overall 3step algorithm. We assume that the temporal graph has edges sorted by timestamps, which is reasonable if edges are logged in their order of occurrence, and we preprocess in linear time such that we can access the sorted list of all edges between and in time. Constructing the timesorted sequence in step 2 of the algorithm then takes time. Each edge inputted to Alg. 4.1 is processed exactly twice: once to increment counts when it enters the time window and once to decrement counts when it exits the time window. As presented in Alg. 4.1, each update changes counters resulting in an overall complexity of . However, one could modify Alg. 4.1 to only update counts for contiguous subsequences of the sequence , which would change counters and have overall complexity . We are typically only interested in small constant values of and (for our experiments in Section 5, and ), in which case the running time is linear in the size of the input to the algorithm, i.e., .
In the remainder of this section we analyze our 3step algorithm with respect to different types of motifs (2node, stars, and triangles) and argue benefits as well as deficiencies of the proposed framework. We show that for node motifs, our general counting framework takes time linear in the total number of edges . Since all the input data needs to be examined for computing exact counts, this means the algorithm is optimal for node motifs. However, we also show that for star and triangle motifs the algorithm is not optimal, which then motivates us to design faster algorithms in Sec. 4.2.
General algorithm for 2node motifs. We first show how to map node motifs to the framework described above. Any induced graph of a node temporal motif is either a single or a bidirectional edge. In either case, it is straightforward to enumerate over all instances of in the static graph. This leads to the following procedure: (1) for each pair of nodes and for which there is at least one edge, gather and sort the edges in either direction between and ; (2) call Alg. 4.1 with these edges. The obtain the total motif count the counts from each call to Alg. 4.1 are then summed together.
We only need to input each edge to Alg. 4.1 once, and under the assumption that we can access the sorted directed edges from one node to another in time, the merging of edges into sorted order takes linear time. Therefore, the total running time is , which is linear in the number of temporal edges . We are mostly interested in small patterns, i.e., cases when is a small constant. Thus, this methodology is optimal (linear in the input, ) for counting node temporal motif instances.
General algorithm for star motifs. Next, we consider node, edge star motifs , whose induced static graph consists of a center node and neighbors, where edges may occur in either direction between the center node and a neighbor node. For example, in the top left corner of Fig. 3, is a star motif with all edges pointing toward the center node. In such motifs, the induced static graph contains at most static edges—one incoming and outgoing edge from the center node to each neighbor node. We have the following method for counting the number of instances of node, edge star motifs: (1) for each node in the static graph and for each unique set of neighbors, gather and sort the edges in either direction between and the neighbors; (2) count the number of instances of using Alg. 4.1. The counts from each call to Alg. 4.1 are summed over all center nodes.
The major drawback of this approach is that we have to loop over each size neighbor set. This can be prohibitively expensive even when if the center node has large degree. In Section 4.2, we shall design an algorithm that avoids this issue for the case when the star motif has edges and .
General algorithm for triangle motifs. In triangle motifs, the induced graph consists of 3 nodes and at least one directed edge between any pair of nodes (see Fig. 3 for all eight of the edge triangle motifs). The induced static graph of contains at least three and at most six static edges. A straightforward algorithm for counting edge triangle motifs in a temporal graph is:

[noitemsep,topsep=2pt]

Use a fast static graph triangle enumeration algorithm to find all triangles in the static graph induced by [14].

For each triangle , merge all temporal edges from each pair of nodes to get a timesorted list of edges. Use Alg. 4.1 to count the number of instances of .
This approach is problematic as the edges between a pair of nodes may participate in many triangles. Fig. 4 shows a worstcase example for the motif , , with . In this case, the timestamps are ordered by their index. There are edges between and , and each of these edges forms an instance of with every . Thus, the overall worstcase running time of the algorithm is , where TriEnum is the time to enumerate the number of triangles in the static graph. In the following section, we devise an algorithm that significantly reduces the dependency on from linear to sublinear (specifically, ) when there are edges.
4.2 Faster algorithms
The general counting algorithm from the previous subsection counts the number of instances of any node, edge temporal motif, and is also optimal for node motifs. However, the computational cost may be expensive for other motifs such as stars and triangles. We now develop specialized algorithms that count certain motif classes faster. Specifically, we design faster algorithms for counting all 3node, 3edge star and triangle motifs (Fig. 3 illustrates these motifs). Our algorithm for stars is linear in the input size, so it is optimal up to constant factors.
Fast algorithm for 3node, 3edge stars. With node, edge star motifs, the key drawback of using the previous algorithmic approach would be that we would have to loop over all pairs of neighbors given a center node. Instead, we will count all instances of star motifs for a given center node in just a single pass over the edges adjacent to the center node.
We use a dynamic programming approach for counting star motifs. First, note that every temporal edge in a star with center is defined by (1) a neighbor node, (2) a direction of the edge (outward from or inward to ), and (3) the timestamp. With this insight we then notice that there are 3 classes of star motifs on 3 nodes and 3 edges:
where each class has motifs for each of the possible directions on the three edges.
Now, suppose we process the timeordered sequence of edges containing the center node . We maintain the following counters when processing an edge with timestamp :

[noitemsep,topsep=2pt]

is the number of sequentially ordered pairs of edges in where the first edge points in direction and the second edge points in direction

is the analogous counter for the time window .

is the number of pairs of edges where the first edge is in direction and occurred at time and the second edge is in direction and occurred at time such that .
If we are currently processing an edge, the “pre” class gets new motif instances for any choice of directions and (specifying the first two edge directions) and the current edge serves as the third edge in the motif (hence specifying the third edge direction). Similar updates are made with the and counters, where the current edge serves as the first or second edge in the motif, respectively.
In order for our algorithm to be efficient, we must quickly update our counters. To aid in this, we introduce two additional counters:

[noitemsep,topsep=2pt]

is the number of times node has appeared in an edge with with direction dir in the time window

is the analogous counter but for the time window .
Following the ideas of Alg. 4.1, it is easy to update these counters when we process a new edge. Consequently, , , and can be maintained when processing an edge with just a few simple primitives:

[noitemsep,topsep=2pt]

Push() and Pop() update the counts for , , and when edges enter and leave the time windows and .

ProcessCurrent() updates motif counts involving the current edge and updates the counter .
We describe the general procedure in Alg. 4.2, which will also serve as the basis for our fast triangle counting procedure, and Alg. 4.2 implements the subroutines Push(), Pop(), and ProcessCurrent() for counting instances of node, edge star motifs. The , , and counters in Alg. 4.2 maintain the counts of the three different classes of stars described above.
Finally, we note that our counting scheme incorrectly includes instances of node motifs such as , , , but we can use the efficient node motif counting algorithm to account for this. Putting everything together, we have the following procedure:

[noitemsep,topsep=2pt]

For each node in the temporal graph , get a timeordered list of all edges containing .

For each neighbor of a star center , subtract the 2node motif counts using Alg. 4.1.
If the edges of are timesorted, the first step can be done in linear time. The second and third steps run in linear time in the input size. Each edge is used in steps 2 and 3 exactly twice: once for each end point as the center node. Thus, the overall complexity of the algorithm is , which is optimal up to constant factors.
[tb] KwToinForfor:emph myprocProcedure with , time window Initialize counters pre_nodes, post_nodes, mid_sum, pre_sum, and post_sum; , pre_nodes, pre_sum, , post_nodes, post_sum, , post_nodes, post_sum, pre_nodes, pre_sum,
pushPush popPop processProcessCurrent Sequence of edges[tb] KwToinForfor:emph myprocProcedure count_pre, count_post, count_mid node_count, sum, node_count, sum, count_pre, count_post, count_mid
pushPush popPop processProcessCurrent Initialize counters[tb] KwToinForfor:emph myprocProcedure count node_count, sum, node_count, sum, XOR dir count key map to Fig. 3: , , , , , count
pushPush popPop processProcessCurrent Initialize counterFast algorithm for 3edge triangle motifs. While our fast star counting routine relied on counting motif instances for all edges adjacent to a given node, our fast triangle algorithm is based on counting instances for all edges adjacent to a given pair of nodes. Specifically, given a pair of nodes and and a list of common neighbors , we count the number of motif instances for triangles . Given all of the edges between these three nodes, the counting procedures are nearly identical to the case of stars. We use the same general counting method (Alg. 4.2), but the behavior of the subroutines Push(), Pop(), and ProcessCurrent() depends on whether or not the edge is between and .
These methods are implemented in Alg. 4.2. The input is a list of edges adjacent to a given pair of neighbors and , where each edge consists of four pieces of information: (1) a neighbor node nbr, (2) an indicator of whether or not the node nbr connects to node or node , (3) the direction dir of the edge, and (4) the timestamp. The node counters ( and ) in Alg. 4.2 have an extra dimension compared to Alg. 4.2 to indicate whether the counts correspond to edges containing node or node (denoted by “uorv”). Similarly, the sum counters (, and ) have an extra dimension to denote if the first edge is incident on node or node .
Recall that the problem with counting triangle motifs by the general framework in Alg. 4.1 is that a pair of nodes with many edges might have to be counted for many triangles in the graph. However, with Alg. 4.2, we can simultaneously count all triangles adjacent to a given pair of nodes. What remains is that we must assign each triangle in the static graph to a pair of nodes. Here, we propose to assign each triangle to the pair of nodes in that triangle containing the largest number of edges, which is sketched in Alg. 4.2. Alg. 4.2 aims to process as many triangles as possible for pairs of nodes with many edges. The following theorem says that this is faster than simply counting for each triangle (described in Section 4.1). Specifically, we reduce complexity to .
Theorem.
In the worse case, Alg. 4.2 runs in time , where TriEnum is the time to enumerate all triangles in the static graph , is the total number of temporal edges, and is the number of static triangles in .
Proof.
Let be the number of edges between the th pair of nodes with at least one edge, and let be the number of times that edges on this pair are used in a call to Alg. 4.2 by Alg. 4.2. Since Alg. 4.2 runs in linear time in the number of edges in its input, the total running time is on the order of .
The are fixed, and we wish to find the values of that maximize the summation. Without loss of generality, assume that the are in decreasing order so that the most number of edges between a pair of nodes is . Consequently, . Note that each triangle contributes to at most a constant repeat processing of edges for a given pair of nodes. Hence, for some constant . The summation is maximized when , , and so on up to some index for which . Now given that the are fixed and the are ordered, the summation is maximized when . In this case, . ∎
[t!] KwToinForfor:emph myprocProcedure of number of temporal edges on each static edge in static triangle , , Add to edge set if and assigned to temporal edge timesorted Append to temporaledge list if and assigned to undirected edge Update counts using Alg. 4.2 with input
Enumerate all triangles in the undirected static graph5 Experiments
Next, we use our algorithms to reveal patterns in a variety of temporal network datasets. We find that the number of instances of various temporal motifs reveal basic mechanisms of the networks. Datasets and implementations of our algorithms are available at http://snap.stanford.edu/temporalmotifs.
5.1 Data
dataset  # nodes  # static  # edges  time span 
edges  (days)  
EmailEu  986  2.49K  332K  803 
PhonecallEu  1.05M  2.74M  8.55M  7 
SMSA  44.1K  67.2K  545K  338 
CollegeMsg  1.90K  20.3K  59.8K  193 
StackOverflow  2.58M  34.9M  47.9M  2774 
Bitcoin  24.6M  88.9M  123M  1811 
FBWall  45.8K  264K  856K  1560 
WikiTalk  1.09M  3.13M  6.10M  2277 
PhonecallME  18.7M  360M  2.04B  364 
SMSME  6.94M  51.5M  800M  89 
We gathered a variety of datasets in order to study the patterns of temporal motifs in several domains. The datasets are described below and summary statistics are in Table 1. The time resolution of the edges in all datasets is one second.
EmailEu. This dataset is a collection of emails between members of a European research institution [17]. An edge signifies that person sent person an email at time .
PhonecallEu. This dataset was constructed from telephone call records for a major European service provider. An edge signifies that person called person starting at time .
SMSA. Short messaging service (SMS) is a texting service provided on mobile phones. In this dataset, an edge means that person sent an SMS message to person at time [28].
CollegeMsg. This dataset is comprised of private messages sent on an online social network at the University of California, Irvine [21]. Users could search the network for others and then initiate conversation based on profile information. An edge means that user sent a private message to user at time .
StackOverflow. On stack exchange web sites, users post questions and receive answers from other users, and users may comment on both questions and answers. We derive a temporal network by creating an edge if, at time , user : (1) posts an answer to user ’s question, (2) comments on user ’s question, or (3) comments on user ’s answer. We formed the temporal network from the entirety of Stack Overflow’s history up to March 6, 2016.
Bitcoin. Bitcoin is a decentralized digital currency and payment system. This dataset consists of all payments made up to October 19, 2014 [11]. Nodes in the network correspond to Bitcoin addresses, and an individual may have several addresses. An edge signifies that bitcoin was transferred from address to address at time .
FBWall. The edges of this dataset are wall posts between users on the social network Facebook located in the New Orleans region [26]. Any friend of a given user can see all posts on that user’s wall, so communication is public among friends. An edge means that user posted on user ’s wall at time .
WikiTalk. This dataset represents edits on user talk pages on Wikipedia [16]. An edge signifies that user edited user ’s talk page at time .
PhonecallME and SMSME. This dataset is constructed from phone call and SMS records of a large telecommunications service provider in the Middle East. An edge in PhonecallME means that user initiated a call to user at time . An edge in SMSME means that user sent an SMS message to user at time . We use these networks for scalability experiments in Section 5.3.
5.2 Empirical observations of motif counts
We first examine the distribution of 2 and 3node, 3edge motif instance counts from 8 of the datasets described in Section 5.1 with hour (Fig. 5). We choose 1 hour for the time window as this is close to the median time for a node to take part in three edges in most of our datasets. We make a few empirical observations uniquely available due to temporal motifs and provide possible explanations for these observations.
Blocking communication. If an individual typically waits for a reply from one individual before proceeding to communicate with another individual, we consider it a blocking form of communication. A typical conversation between two individuals characterized by fast exchanges happening back and forth is blocking as it requires complete attention of both individuals. We capture this behavior in the “blocking motifs” , and , which contain 3 edges between two nodes with at least one edge in either direction (Fig. 6, left). However, if the reply doesn’t arrive soon, we might expect the individual to communicate with others without waiting for a reply from the first individual. This is a nonblocking form of communication and is captured by the “nonblocking motifs” , and having edges originating from the same source but directed to different destinations (Fig. 6, right)
The fractions of counts corresponding to the blocking and nonblocking motifs out of the counts for all 36 motifs in Fig. 3 uncover several interesting characteristics in communication networks ( hour; see Fig. 6). In FBWall and SMSA, blocking communication is vastly more common, while in EmailEu nonblocking communication is prevalent. Email is not a dynamic method of communication and replies within an hour are rare. Thus, we would expect nonblocking behavior. Interestingly, the CollegeMsg dataset shows both behaviors as we might expect individuals to engage in multiple conversations simultaneously. In complete contrast, the PhonecallEu dataset shows neither behavior. A simple explanation is that that a single edge (a phone call) captures an entire conversation and hence blocking behavior does not emerge.
Cost of switching. Amongst the nonblocking motifs discussed above, captures two consecutive switches between pairs of nodes whereas and each have a single switch (Fig. 7, right). Prevalence of indicates a lower cost of switching targets, whereas prevalence of the other two motifs are indicative of a higher cost. We observe in Fig. 7 that the ratio of 2switch to 1switch motif counts is the least in StackOverflow, followed by WikiTalk, CollegeMsg and then EmailEu. On Stack Overflow and Wikipedia talk pages, there is a high cost to switch targets because of peer engagement and depth of discussion. On the other hand, in the CollegeMsg dataset there is a lesser cost to switch because it lacks depth of discussion within the time frame of 1 hour. Finally, in EmailEu, there is almost no peer engagement and cost of switching is negligible.
Cycles in Bitcoin. Of the eight edge triangle motifs, and are cyclic, i.e., the target of each edge serves as the source of another edge. We observe in Fig. 8 that the fraction of triangles that are cyclic is much higher in Bitcoin compared to any other dataset. This can be attributed to the transactional nature of Bitcoin where the total amount of bitcoin is limited. Since remittance (outgoing edges) is typically associated with earnings (incoming edges), we should expect cyclic behavior.
Datasets from the same domain have similar counts. Static graphs from similar domains tend to have similar motif count distributions [18, 25, 29]. Here, we find similar results in temporal networks. We formed two collections of datasets from similar domains. First, we took subsets of the EmailEu dataset corresponding to email communication within four different departments at the institution. Second, we constructed temporal graphs from the stack exchange communities Math Overflow, Super User, and Ask Ubuntu to study in conjunction with the StackOverflow dataset. We form count distributions by normalizing the counts of the 36 different motifs in Fig. 5. For datasets from a similar domain, we expect that if the count distributions are similar, then most of the variance is captured by a few principal components. To compare, we use four datasets from dissimilar domains (EmailEu, PhonecallEu, SMSA, WikiTalk). Fig. 9 shows that to explain 90% variance, EmailEu subnetworks need just one principal component, stack exchange networks need two, and the dissimilar networks need three.
Motif counts at varying time scales. We now explore how motif counts change at different time scales. For the StackOverflow dataset we counted the number of instances of  and node, edge temporal motifs for 60, 300, 1800, and 3600 seconds (Fig. 10). These counts determine the number of motifs that completed in the intervals [0, 60], (60, 300], (300, 1800s], and (1800, 3600] seconds (e.g., subtracting 60 second counts from 300 second counts gives the interval (60, 300]). Observations at smaller timescales reveal phenomenon which start to get eclipsed at larger timescales. For instance, on short time scales, motif (Fig. 10, topleft corner) is quite common. We suspect this arises from multiple, quick comments on the original question, so the original poster receives many incoming edges. At larger time scales, this behavior is still frequent but relatively less so. Now let us compare counts for , , , (the four in the top right corner) with counts for , , , (the four in the center). The former counts likely correspond to conversations with the original poster while the latter are constructed by the same user interacting with multiple questions. Between 300 and 1800 seconds (5 to 30 minutes), the former counts are relatively more common while the latter counts only become more common after 1800 seconds. A possible explanation is that the typical length of discussions on a post is about 30 minutes, and later on, users answer other questions.
Next, we examine messaging behavior in the CollegeMsg dataset at finegrained time intervals. We counted the number of motifs consisting of a single node sending three outgoing messages to one or two neighbors (motifs , , , and ) in the time bins seconds, (Fig. 11). We first notice that at small time scales, the motif consisting of three edges to a single neighbor () occurs frequently. This pattern could emerge from a succession of quick introductory messages. Overall, motif counts increase from roughly 1 minute to 20 minutes and then decline. Interestingly, after 5 minutes, counts for the three motifs with one switch in the target (, , and ) grow at a faster rate than the counts for the motif with two switches (). As mentioned above, this pattern could emerge from a tendency to send several messages in one conversation before switching to a conversation with another friend.
5.3 Algorithm scalability
dataset  # static  time, Alg. 4.1  time, Alg. 4.2  speedup 
triangles  (seconds)  (seconds)  
WikiTalk  8.11M  51.1  26.6  1.92x 
Bitcoin  73.1M  27.3K  483  56.5x 
SMSME  78.0M  2.54K  1.11K  2.28x 
StackOverflow  114M  783  606  1.29x 
PhonecallME  667M  12.2K  8.59K  1.42x 
Finally, we performed scalability experiments of our algorithms. All algorithms were implemented in C++, and all experiments ran using a single thread of a 2.4GHz Intel Xeon E74870 processor. We did not measure the time to load datasets into memory, but our timings include all preprocessing time needed by the algorithms (e.g., the triangle counting algorithms first find triangles in the static graph). We emphasize that our implementation is single threaded, and the methods can be sped up with a parallel algorithm.
First, we used both the general counting method (Alg. 4.1) and the fast counting method (Alg. 4.2) to count the number of all eight 3edge temporal triangle motifs in our datasets ( 1 hour). Table 2 reports the running times of the algorithms for all datasets with at least one million triangles in the static graph. For all of these datasets, our fast temporal triangle counting algorithm provides significant performance gains over the general counting method, ranging between a 1.29x and a 56.5x speedup. The gains of the fast algorithm are the largest for Bitcoin, which is due to some pairs of nodes having many edges between them and also participating in many triangles.
Second, we measured the time to count various edge temporal motifs in our largest dataset, PhonecallME. Specifically, we measured the time to compute (1) node motifs, (2) node stars, and (3) triangles on the first million edges in the dataset for (Fig. 12). The time to compute the node, edge motifs and the node, edge stars scales linearly, as expected from our algorithm analysis. The time to count triangle motifs grows superlinearly and becomes the dominant cost when there is a large number of edges. For practical purposes, the running times are quite modest. With two billion edges, our methods take less than 3.5 hours to complete (executing sequentially).
6 Discussion
We have developed temporal network motifs as a tool for analyzing temporal networks. We introduced a general framework for counting instances of any temporal motif as well as faster algorithms for certain classes of motifs and found that motif counts reveal key structural patterns in a variety of temporal network datasets. Our work opens a number of avenues for additional research. First, our fast algorithms are designed for node, edge star and triangle motifs. We expect that the same general techniques can be used to count more complex temporal motifs. Next, it is important to note that our fast algorithms only count the number of instances of motifs rather than enumerate the instances. This concept has also been used to accelerate static motif counting [22]. Temporal motif enumeration algorithms provide an additional algorithmic design challenge. There is also a host of theoretical questions in this area for lower bounds on temporal motif counting. Finally, motif counts can also be measured with respect to a null model [13, 19]. Such analysis may yield additional discoveries. Importantly, our algorithms will speed up such computations, which use raw counts from many random instances of a generative null model.
Acknowledgements.
We thank Moses Charikar for valuable discussion. This research has been supported in part by NSF IIS1149837, ARO MURI, DARPA SIMPLEX and NGS2, Boeing, Bosch, Huawei, Lightspeed, SAP, Tencent, Volkswagen, Stanford Data Science Initiative, and a Stanford Graduate Fellowship.
References
 [1] M. Araujo, S. Papadimitriou, S. Günnemann, C. Faloutsos, P. Basu, A. Swami, E. E. Papalexakis, and D. Koutra. Com2: fast automatic discovery of temporal (‘comet’) communities. In PAKDD, 2014.
 [2] A.L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999.
 [3] A. R. Benson, D. F. Gleich, and J. Leskovec. Tensor spectral clustering for partitioning higherorder network structures. In SDM, 2015.
 [4] A. R. Benson, D. F. Gleich, and J. Leskovec. Higherorder organization of complex networks. Science, 353(6295):163–166, 2016.
 [5] M. Berlingerio, F. Bonchi, B. Bringmann, and A. Gionis. Mining graph evolution rules. In ECML PKDD, 2009.
 [6] D. M. Dunlavy, T. G. Kolda, and E. Acar. Temporal link prediction using matrix and tensor factorizations. TKDD, 5(2):10, 2011.
 [7] S. Gurukar, S. Ranu, and B. Ravindran. Commit: A scalable approach to mining communication motifs from dynamic networks. In SIGMOD, 2015.
 [8] P. Holme and J. Saramäki. Temporal networks. Physics Reports, 519(3):97–125, 2012.
 [9] H. Huang, J. Tang, S. Wu, L. Liu, et al. Mining triadic closure patterns in social networks. In WWW, 2014.
 [10] A. Z. Jacobs, S. F. Way, J. Ugander, and A. Clauset. Assembling thefacebook: Using heterogeneity to understand online social network assembly. In Web Science, 2015.
 [11] D. Kondor, M. Pósfai, I. Csabai, and G. Vattay. Do the rich get richer? an empirical analysis of the bitcoin transaction network. PLOS ONE, 9(2):e86197, 2014.
 [12] G. Kossinets and D. J. Watts. Empirical analysis of an evolving social network. Science, 311(5757):88–90, 2006.
 [13] L. Kovanen, M. Karsai, K. Kaski, J. Kertész, and J. Saramäki. Temporal motifs in timedependent networks. JSTAT, 2011(11):P11005, 2011.
 [14] M. Latapy. Mainmemory triangle computations for very large (sparse (powerlaw)) graphs. Theoretical Computer Science, 407(1):458–473, 2008.
 [15] J. Leskovec, D. Huttenlocher, and J. Kleinberg. Signed networks in social media. In CHI, 2010.
 [16] J. Leskovec, D. P. Huttenlocher, and J. M. Kleinberg. Governance in social media: A case study of the wikipedia promotion process. In ICWSM, 2010.
 [17] J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. TKDD, 1(1):2, 2007.
 [18] R. Milo, S. Itzkovitz, N. Kashtan, R. Levitt, S. ShenOrr, I. Ayzenshtat, M. Sheffer, and U. Alon. Superfamilies of evolved and designed networks. Science, 303(5663):1538–1542, 2004.
 [19] R. Milo, S. ShenOrr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, 298(5594):824–827, 2002.
 [20] M. E. Newman. The structure and function of complex networks. SIAM Review, 45(2):167–256, 2003.
 [21] P. Panzarasa, T. Opsahl, and K. M. Carley. Patterns and dynamics of users’ behavior and interaction: Network analysis of an online community. JASIST, 2009.
 [22] A. Pinar, C. Seshadhri, and V. Vishal. ESCAPE: Efficiently Counting All 5Vertex Subgraphs. arXiv: 1610.09411, 2016.
 [23] C. Tantipathananandh, T. BergerWolf, and D. Kempe. A framework for community identification in dynamic social networks. In KDD, 2007.
 [24] J. Ugander, L. Backstrom, and J. Kleinberg. Subgraph frequencies: Mapping the empirical and extremal geography of large graph collections. In WWW, 2013.
 [25] A. Vazquez, R. Dobrin, D. Sergi, J.P. Eckmann, Z. Oltvai, and A.L. Barabási. The topological relationship between the largescale attributes and local interaction patterns of complex networks. PNAS, 101(52):17940–17945, 2004.
 [26] B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the evolution of user interaction in Facebook. In WOSN, 2009.
 [27] S. Wernicke and F. Rasche. Fanmod: a tool for fast network motif detection. Bioinformatics, 22(9):1152–1153, 2006.
 [28] Y. Wu, C. Zhou, J. Xiao, J. Kurths, and H. J. Schellnhuber. Evidence for a bimodal distribution in human communication. PNAS, 107(44):18803–18808, 2010.
 [29] Ö. N. Yaveroğlu, N. MalodDognin, D. Davis, Z. Levnajic, V. Janjic, R. Karapandza, A. Stojmirovic, and N. Pržulj. Revealing the hidden language of complex networks. Scientific Reports, 4, 2014.
 [30] Q. Zhao, Y. Tian, Q. He, N. Oliver, R. Jin, and W.C. Lee. Communication motifs: a tool to characterize social communications. In CIKM, 2010.