Time-varying graphs have been extensively studied from the perspective of time-aggregated communications between nodes during a particular time interval [Leskovec, Kleinberg, and Faloutsos2005]. Using this representation, researchers have discovered structural patterns in graphs with long-lived links among the nodes (e.g., hub nodes in the Web [Barabási and Albert1999]). However, today, a wide variety of electronic and online communication tools are producing streams of graph data. These streams are usually rapid, time-varying, unbounded sequences of short-lived links among the nodes (e.g., email, SMS, tweets). Since the interactions are indicators of hidden relationship structures among the individuals, there has recently been an interest in moving beyond the perspective of time-aggregated graphs to model and mine the dynamic structure of graph streams.
The work in [Perra et al.2012] highlights the biases that result if time-aggregated representations are used to analyze dynamical processes such as information/disease spread, and discuss the importance of distinguishing between the structural evolution and the dynamical process unfolding on top of its structure. Also, the work in [Gautreau, Barrat, and Barthélemy2009] compared the statistical distributions of structural properties across separate, daily snapshots of a US airport network graph and showed that even when the statistical distributions are stationary, there is often intense activity within each graph snapshot, with many links disappearing/appearing.
In this work, we develop a framework for learning the higher-level latent state space of time-varying graphs. We study data collected from the email logs of university mail servers. The links correspond to the set of email communications among the students/staff/faculty in Purdue university from August to February . This seven month period includes several calendar events (breaks, vacations, …etc.) that take place during the academic year. These real-time events usually impact the underlying graph process to make its characteristics non-stationary and drift from the overall mean. We show how our framework discovers subsequences that correspond to these global events without detecting local changes that are due to diurnal patterns.
Assume that we have a stream of edges continuously evolving over time . These edges correspond to the communications among a set of nodes . Here, we consider the time-varying graph as a sequence of graphs at each timestep, denoted by .
Models of Representation
Since we cannot know the real lifetime of the relationships among the nodes in real graphs, we compare two different models for the representation of the relationship lifetime:
Discrete Model: In this representation, edges are aggregated in non-overlapping discrete windows of size , such that any edge that occurs at time will be considered in the graph if and only if . We consider equals day.
Probabilistic Model: In this representation, we consider the graph at time , as a probabilistic graph that contains any edge that occurs at any time such that , and
is the probability of the edge between nodeand node , and is the mean lifetime of the edge. These probabilities change over time, and we age-out older edges with probabilities less than a very small cut-off threshold. In this paper, we consider equals days.
The normal operation of any system or dynamical process can be characterized in different temporal states. To explore the state space of a time-varying graph, we represent each graph in the sequence of graphs by a set of attributes . These attributes correspond to structural properties of each graph . Here, we consider the average degree and average clustering of the graph as attributes. Then, we use the KMeans algorithm to cluster the graphs in using their attributes (average degree and average clustering). Each cluster includes all the graphs that have similar properties. Therefore, each cluster represents a state and the state transition diagram corresponds to the cluster memberships of graphs in versus time. We select the number of clusters equals seven (i.e., corresponding to seven days). Figure 1 shows the plots of the average degree at each day after subtracting the linear trend to make the data changing around the zero mean (i.e., detrending the timeseries). Clearly, the discrete model shows the local daily changes that take place in , however, the probabilistic model emphasizes the regions in the stream that correspond to global events taking place (as we see next in Results section).
Results and Discussion
Figures 2(b), and 2(a) show the scatter plots of graphs in the sequence as points in the space colored by their state. Also, Figures 3(b), and 3(a) show the state transition plot versus time for both the two models. Note that we clustered the graphs into seven states from A to G. Overall, the results indicate that the probabilistic model is more capable of tracking the global events taking place during the fall semester. For example, consider the subsequence between the events of ”Wint. Break” (corresponding to the start of the winter break) until ”Spring Start” (corresponding to the start of the Spring semester), the probabilistic model detects the break event as a subsequence of very low activity graphs (state A), when most of the students will be out of campus. However, the discrete model shows local email communications that possibly correspond to staff members, missing the global break event. In addition, the probabilistic model can distinguish between different types of events and breaks based on the intensity of their effect on the graph structure. For instance, the ”Thanksgiving Break” and the ”Fall Break” are labeled as two different states C and A. However, the discrete model treats them as state A.
To summarize, in this work, we developed a framework for learning the higher-level latent state space of time-varying graphs. The proposed framework is based on the probablistic representation model of graph streams. In future work, we aim to study the problems of mining and modeling probabilistic graph streams and how to use these models to predict the global characteristics of the graph strcuture in the next timesteps.
- [Barabási and Albert1999] Barabási, A.-L., and Albert, R. 1999. Emergence of scaling in random networks. In Science 286(5439):509–512.
- [Caceres, Berger-Wolf, and Grossman2011] Caceres, R. S.; Berger-Wolf, T.; and Grossman, R. 2011. Temporal scale of processes in dynamic networks. In ICDMW, 925–932.
- [Gautreau, Barrat, and Barthélemy2009] Gautreau, A.; Barrat, A.; and Barthélemy, M. 2009. Microdynamics in stationary complex networks. In National Academy of Sciences 106(22):8847–8852.
- [Goyal, Bonchi, and Lakshmanan2010] Goyal, A.; Bonchi, F.; and Lakshmanan, L. V. 2010. Learning influence probabilities in social networks. In WSDM, 241–250.
- [Leskovec, Kleinberg, and Faloutsos2005] Leskovec, J.; Kleinberg, J.; and Faloutsos, C. 2005. Graphs over time: densification laws, shrinking diameters and possible explanations. In SIGKDD, 177–187.
- [Perra et al.2012] Perra, N.; Gonçalves, B.; Pastor-Satorras, R.; and Vespignani, A. 2012. Activity driven modeling of time varying networks. In Scientific reports 2.