Learning the Latent State Space of Time-Varying Graphs

03/14/2014 ∙ by Nesreen K. Ahmed, et al. ∙ Purdue University 0

From social networks to Internet applications, a wide variety of electronic communication tools are producing streams of graph data; where the nodes represent users and the edges represent the contacts between them over time. This has led to an increased interest in mechanisms to model the dynamic structure of time-varying graphs. In this work, we develop a framework for learning the latent state space of a time-varying email graph. We show how the framework can be used to find subsequences that correspond to global real-time events in the Email graph (e.g. vacations, breaks, ...etc.). These events impact the underlying graph process to make its characteristics non-stationary. Within the framework, we compare two different representations of the temporal relationships; discrete vs. probabilistic. We use the two representations as inputs to a mixture model to learn the latent state transitions that correspond to important changes in the Email graph structure over time.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Time-varying graphs have been extensively studied from the perspective of time-aggregated communications between nodes during a particular time interval [Leskovec, Kleinberg, and Faloutsos2005]. Using this representation, researchers have discovered structural patterns in graphs with long-lived links among the nodes (e.g., hub nodes in the Web [Barabási and Albert1999]). However, today, a wide variety of electronic and online communication tools are producing streams of graph data. These streams are usually rapid, time-varying, unbounded sequences of short-lived links among the nodes (e.g., email, SMS, tweets). Since the interactions are indicators of hidden relationship structures among the individuals, there has recently been an interest in moving beyond the perspective of time-aggregated graphs to model and mine the dynamic structure of graph streams.

The work in [Perra et al.2012] highlights the biases that result if time-aggregated representations are used to analyze dynamical processes such as information/disease spread, and discuss the importance of distinguishing between the structural evolution and the dynamical process unfolding on top of its structure. Also, the work in [Gautreau, Barrat, and Barthélemy2009] compared the statistical distributions of structural properties across separate, daily snapshots of a US airport network graph and showed that even when the statistical distributions are stationary, there is often intense activity within each graph snapshot, with many links disappearing/appearing.

In this work, we develop a framework for learning the higher-level latent state space of time-varying graphs. We study data collected from the email logs of university mail servers. The links correspond to the set of email communications among the students/staff/faculty in Purdue university from August to February . This seven month period includes several calendar events (breaks, vacations, …etc.) that take place during the academic year. These real-time events usually impact the underlying graph process to make its characteristics non-stationary and drift from the overall mean. We show how our framework discovers subsequences that correspond to these global events without detecting local changes that are due to diurnal patterns.

Framework

Assume that we have a stream of edges continuously evolving over time . These edges correspond to the communications among a set of nodes . Here, we consider the time-varying graph as a sequence of graphs at each timestep, denoted by .

Models of Representation

Since we cannot know the real lifetime of the relationships among the nodes in real graphs, we compare two different models for the representation of the relationship lifetime:

  1. Discrete Model: In this representation, edges are aggregated in non-overlapping discrete windows of size , such that any edge that occurs at time will be considered in the graph if and only if . We consider equals day.

  2. Probabilistic Model: In this representation, we consider the graph at time , as a probabilistic graph that contains any edge that occurs at any time such that , and

    is the probability of the edge between node

    and node , and is the mean lifetime of the edge. These probabilities change over time, and we age-out older edges with probabilities less than a very small cut-off threshold. In this paper, we consider equals days.

Previous work considered different variation of these models for social network analysis, see [Goyal, Bonchi, and Lakshmanan2010, Caceres, Berger-Wolf, and Grossman2011] for details.

State-Space Model

The normal operation of any system or dynamical process can be characterized in different temporal states. To explore the state space of a time-varying graph, we represent each graph in the sequence of graphs by a set of attributes . These attributes correspond to structural properties of each graph . Here, we consider the average degree and average clustering of the graph as attributes. Then, we use the KMeans algorithm to cluster the graphs in using their attributes (average degree and average clustering). Each cluster includes all the graphs that have similar properties. Therefore, each cluster represents a state and the state transition diagram corresponds to the cluster memberships of graphs in versus time. We select the number of clusters equals seven (i.e., corresponding to seven days). Figure 1 shows the plots of the average degree at each day after subtracting the linear trend to make the data changing around the zero mean (i.e., detrending the timeseries). Clearly, the discrete model shows the local daily changes that take place in , however, the probabilistic model emphasizes the regions in the stream that correspond to global events taking place (as we see next in Results section).

(a) Disc. Model
(b) Prob. Model
Figure 1: Average degree versus time
(a) Disc. Model
(b) Prob. Model
Figure 2: Plot of time-varying graphs as points in the space
(a) Disc. Model
(b) Prob. Model
Figure 3: State transition diagram

Results and Discussion

Figures 2(b), and 2(a) show the scatter plots of graphs in the sequence as points in the space colored by their state. Also, Figures 3(b), and 3(a) show the state transition plot versus time for both the two models. Note that we clustered the graphs into seven states from A to G. Overall, the results indicate that the probabilistic model is more capable of tracking the global events taking place during the fall semester. For example, consider the subsequence between the events of ”Wint. Break” (corresponding to the start of the winter break) until ”Spring Start” (corresponding to the start of the Spring semester), the probabilistic model detects the break event as a subsequence of very low activity graphs (state A), when most of the students will be out of campus. However, the discrete model shows local email communications that possibly correspond to staff members, missing the global break event. In addition, the probabilistic model can distinguish between different types of events and breaks based on the intensity of their effect on the graph structure. For instance, the ”Thanksgiving Break” and the ”Fall Break” are labeled as two different states C and A. However, the discrete model treats them as state A.

To summarize, in this work, we developed a framework for learning the higher-level latent state space of time-varying graphs. The proposed framework is based on the probablistic representation model of graph streams. In future work, we aim to study the problems of mining and modeling probabilistic graph streams and how to use these models to predict the global characteristics of the graph strcuture in the next timesteps.

References

  • [Barabási and Albert1999] Barabási, A.-L., and Albert, R. 1999. Emergence of scaling in random networks. In Science 286(5439):509–512.
  • [Caceres, Berger-Wolf, and Grossman2011] Caceres, R. S.; Berger-Wolf, T.; and Grossman, R. 2011. Temporal scale of processes in dynamic networks. In ICDMW, 925–932.
  • [Gautreau, Barrat, and Barthélemy2009] Gautreau, A.; Barrat, A.; and Barthélemy, M. 2009. Microdynamics in stationary complex networks. In National Academy of Sciences 106(22):8847–8852.
  • [Goyal, Bonchi, and Lakshmanan2010] Goyal, A.; Bonchi, F.; and Lakshmanan, L. V. 2010. Learning influence probabilities in social networks. In WSDM, 241–250.
  • [Leskovec, Kleinberg, and Faloutsos2005] Leskovec, J.; Kleinberg, J.; and Faloutsos, C. 2005. Graphs over time: densification laws, shrinking diameters and possible explanations. In SIGKDD, 177–187.
  • [Perra et al.2012] Perra, N.; Gonçalves, B.; Pastor-Satorras, R.; and Vespignani, A. 2012. Activity driven modeling of time varying networks. In Scientific reports 2.