I Introduction
Signals often have underlying network structures, e.g., sensor, traffic, brain, and social networks. Graphs, consisting of sets of nodes and edges, are a fundamental tool to describe the relationship among entities. Graph edges and the corresponding edge weights can be used to capture the similarity between nodes (where a higher positive weight indicates greater similarity). Introducing a graph representation enables us to efficiently analyze signals on networks in many practical applications such as epidemics [1, 2], transportation networks [3], and social networks [4].
Even when data is not associated with an actual network, graphs are efficient tools to represent latent structures in the data. For example, a principal component analysis can be improved by imposing a prior based on graphs
[5, 6, 7]. Graphbased image processing enables us to improve performance in several image processing tasks [8, 9, 10, 11, 12, 13]. However, graphs are not provided in many cases a priori.Graph learning methods aim at identifying graphs from observed data [14, 15, 16]
. Each observation is a vector, and each entry corresponds to the observation at one node. The goal is to obtain the weights of all the edges connecting those nodes. Most graph learning methods identify a single static graph from all the observations
[17, 18, 19, 20, 21, 22, 23, 24, 25]. These static graph learning methods assume that the node relationships obtained from the observations do not change during the measurement process. However, in many applications where the observations are obtained over a period of time, a timevarying graph will provide a better model. Examples of such applications include estimation of the timevarying brain functional connectivity from EEG or fMRI data [26], identification of temporal transit of biological networks such as protein, RNA, and DNA [27], and inference of relationships among companies from historical stock price data [28], dynamic point cloud processing, and analysis of physical measurement data such as temperature. A straightforward approach to estimate a timevarying graph would consist of aggregating temporal observations into nonoverlapping windows and then using an existing static graph learning method to estimate a graph for each time windows. However, such an approach would possess some drawbacks.First, this method would estimate a graph independently for each temporal interval, thus ignoring temporal relations that may exist in timevarying graphs. Second, timevarying graph learning may require estimating graphs from time windows containing only a small fraction of observations due to the tradeoff between the choice of window length and temporal resolution. For example, if we choose a short window to adapt to fast temporal changes in the graph, then we may not have enough data to learn a graph within each window. However, the existing static graph learning methods, however, cannot successfully infer graphs from a small number of observations. On the other hand, if we use a longer window, we may not be able to keep up with the temporal changes.
This paper presents a timevarying graph learning method based on timevarying graph factor analysis (TGFA), which is an extension of its static counterpart, static graph factor analysis (SGFA) [18]
. We propose the TGFAbased methods to estimate timevarying graphs from a collection of spatiotemporal measurements. The SGFA formulates a signal generation model based on a graph signal processing (GSP) perspective, where it is assumed that the observed signals have some specific spectral properties with respect to graph Fourier transform (GFT) of the graph to be learned. For example, if a multivariate Gaussian model is chosen, it leads to observed signals generated from a Gaussian distribution whose inverse covariance matrix (i.e., precision matrix) is given by the graph Laplacian of the underlying graph
[18, 14]. Unlike SGFA, TGFA considers the graph evolution as illustrated in Fig. 1. The graph evolution can be represented by a sequence of graph Laplacians and their corresponding temporal variations.This study focuses on two timevarying graph models, with the following two properties:
(P1) Temporal homogeneity: Most edges and their weights in the timevarying graph should remain unchanged over a shortterm time horizon. In other words, at any given time only a small number of edges in timevarying graphs change. Timevarying graphs in many applications satisfy this property. For example, consider a sensor network where the nodes and the edges represent sensor locations and correlations among sensor measurements respectively. If the sensors record the temperature in a building, various factors such as air conditioning, sunlight, and number of people in the room, locally affect the correlations among the sensor measurements but these factors vary smoothly over time. As a result, this sensor network will be a timevarying graph such that most edges remain constant, while the weights change only slightly over time, i.e., it follows (P1). In addition to this example, timevarying graphs in fMRI and various biological networks seem to have this property [29, 27].
(P2) Switching behavior: Edges and weights remain almost unchanged over time; however, they may suddenly change within a few time slots. This type of timevarying graph appears in situations where some factors cause sudden changes in graph topologies. Prominent examples include brain networks, where epileptic seizures make their topology change suddenly [30].
In this paper, we design an algorithm to estimate the two types of timevarying graphs, namely, graphs with temporal homogeneity and graphs with switching behavior. For this purpose, we formulate the graph learning problem as a convex optimization with regularization of temporal variation derived from TGFA. To solve the convex optimization problem, we utilize a primaldual splitting algorithm [31], which enables us to estimate timevarying graphs more successfully than static graph learning methods.
In experiments with synthetic datasets, our proposed method outperforms existing methods especially when only small amounts of data are available. We also evaluate the performance of our methods for tasks involving real data, such as dynamic point cloud denoising and learning timevarying graphs from spatiotemporal meteorological and temperature data. Our results for dynamic point cloud denoising show that the estimating graph topology information with our method allows us to improve the denoising performance. In the meteorological and temperature data application, we show that our method can learn reasonable timevarying graphs that capture the geographical characteristics without using geographical information. Our recent work [32] proposed a framework for learning timevarying graphs that follow (P1). Our work in this paper substantially extends the work in [32] by allowing learning for graphs with property (P2), which is more suitable for change point detection problems. We also evaluate the robustness of our proposed approach under several conditions of temporal resolution and we compare the computation time with some related approaches.
The remainder of this paper is organized as follows. Section II presents preliminaries concerning graphs and proximal operators. Section III describes the problem formulation of timevarying graph learning. Section IV defines the regularization for temporal graph variation and the optimization problem to learn graphs and proposes an algorithm to find a solution. Section V and VI provides experimental results with synthetic data and real data respectively. Finally, we show the conclusion and future works of this study in Section VII.
Ia Related Work
Several approaches address graph learning problem, and two overview papers about graph learning have been published recently [14, 15]. Among the techniques for learning timevarying graphs, the Kalofolias et al. method, where constraints are introduced so that the edge weights change smoothly over time, is close to ours [33]. This approach uses a smoothness criterion and Tikhonov regularization of temporal variation in graphs to learn a timevarying graph. However, it does not learn timevarying graphs following (P1) exactly because Tikhonov regularization promotes smooth variation of edge weights over time, i.e., it allows changes of both edges and edge weights over shortterm time horizons. While our approach has similar cost functions as those employed in [33], we use different regularization terms that favors learning timevarying graphs with the (P1) property. However, we cannot solve the optimization problem in our approach straightforwardly in the same manner as [33] because the regularization terms used are not differentiable. Therefore, we reformulate the optimization problem to make it solvable with a primaldual splitting algorithm that leads to an efficient learning of a timevarying graph.
Hallac et al. addresses the problem of learning a timevarying graph using timevarying graphical Lasso (TVGL) [28], which combines graphical Lasso with a temporal regularization and finds the solution using alternating direction method of multipliers (ADMM). Note that graphs estimated with this approach often have negative edge weights. In contrast, our proposed method constrains to have nonnegative edges in the estimated graph, because such graphs are often desired for many applications [14, 15]. Furthermore, our method has a lower computation complexity, as TVGL requires eigendecompositions of target matrices to calculate a proximal operator of a logarithm determinant term, whereas our approach is an eigendecompositionfree algorithm.
Baingana et al. consider the problem of identifying a timevarying graph to capture causal relationships in the network contagion evolution [34]. This model focuses on the spread process on networks (e.g., infectious diseases and fake news) and aims to identify the links propagating information over nodes at each time. However, only directed graphs can be estimated by this approach. In contrast, our method estimates undirected graphs whose connections change over time. This is often desired for data that do not have causal relationships. For example, data acquired by physical sensors such as point cloud coordinates and temperatures.
IB Notation
The notation used in this paper is summarized in Table I. Throughout this paper, vectors and matrices are written in bold lowercase and bold uppercase letters respectively. The calligraphic capital letters, namely, and , denote sets.
Number of nodes  
Number of data chunk  
Number of time frames  
th entry of a vector  
entry of a matrix  
th column of  
MoorePenrose pseudoinverse of  
Set of the nonnegative real numbers  
Hadamard product  
sum of squared values of all elements  
sum of absolute values of all elements  
Trace of a matrix  
vector formed by diagonal elements 
Ii Preliminaries
Iia Basic Definitions for GSP
An undirected weighted graph is represented as , where is a set of nodes, is a set of edges, and is a weighted adjacency matrix. The number of nodes is given by . Each element of the weighted adjacency matrix is defined by
(1) 
where is the edge weight between nodes and . The degree matrix is a diagonal matrix whose diagonal element is . The graph Laplacian is given by . Let be a signal on the graph, then the Laplacian quadratic form is given by
(2) 
where and denote the signal values at nodes and , respectively. Note that (2) provides a smoothness measure of the signal on the graph [35].
Since
is a real symmetric matrix, it has orthogonal eigenvectors and can be decomposed into
, where is a matrix whose th column is the eigenvector andis a diagonal eigenvalue matrix. The graph Fourier transform (GFT) and the inverse GFT are defined as
(3)  
(4) 
Eigenvalues of the graph Laplacian correspond to frequencies in the classical Fourier transform, and thus, are often called graph frequencies.
IiB Proximal Operator and Its Properties
Let be a proper lower semicontinuous convex function. The proximal operator of with a parameter is defined by
(5) 
If a proximal operator can be computed efficiently, the function is called proximable.
Proximal operators have the following properties [36]:

If a function is separable across variables, i.e., with , then
(6) where . Thus, the computation in the proximal operator of separable functions reduces to the computation of the proximal operator for each separable part.

If a function is fully separable, i.e., , then
(7) Therefore, in this case, the computation of proximal operator can be reduced to the elementwise computation.
Iii Problem Formulation for timevarying graph learning
We first describe the static graph factor analysis (SGFA) [18], which introduces a model for the generation of graph signals. Subsequently, we introduce a timevarying graph factor analysis (TGFA), which extends SGFA to formulate the timevarying graph learning problems studied in this paper.
Iiia Static Graph Factor Analysis
The observation model of SGFA is defined as follows [18]:
(8) 
where is an observed signal,
is a latent variable represented in the graph frequency domain,
is the GFT matrix and is an additive white Gaussian noise. The observation model in (8) means that the signals possessing graph structures can be represented by the inverse GFT of latent variables in the spectral domain of the underlying graph. SGFA assumes that the latent variable follows the multivariate Gaussian distribution(9) 
This corresponds to the assumption, that signal energy tends to be concentrated in the low frequencies and thus encourages graph signal smoothness. Equations (8) and (9
) lead to the conditional probability of
given :(10) 
From (9) and (10), the marginal distribution of is given by:
(11) 
Note that the fact is used in the derivation of (11). The marginal distribution in (11) indicates that a graph signal
can be generated by the degenerate multivariate Gaussian distribution whose precision matrix is the graph Laplacian of the underlying graph. The maximum a posteriori estimation of
leads to the following problem:(12) 
where and is regarded as a noiseless signal. If the observed signal is noisefree, this optimization problem finds the graph Laplacian that minimizes the smoothness measure as in (2).
IiiB Timevarying Graph Factor Analysis
Suppose that we have a multivariate time series data divided with nonoverlapping time windows , where contains observations at a time window , and the graph corresponding to observations in the same time window is invariant. The choice of depends on a sampling frequency in measurements. Our goal is to learn a sequence of graph Laplacians from the data sequence. We first introduce a timevarying graph factor analysis (TGFA) to formulate the timevarying graph learning. By incorporating a graph evolution process into SGFA, we define TGFA as:
(13)  
(14) 
where , , and are a signal, the graph Laplacian of the underlying graph and a temporal graph variation at a given time , respectively. In Section IIIA, we discuss (13), which indicates that timevarying graph learning requires the timevarying observations to be smooth on the learned graphs at the corresponding time slots. Besides the smoothness criterion, our approach allows us to incorporate prior knowledge about temporal graph variation and will lead to more robust graph learning. Thus, we generalize the timevarying graph learning problem as that of solving the following optimization:
(15) 
where is the valid set of graph Laplacians and is given by
(16) 
, is a regularization term that characterizes the temporal change in graph edges, that is, the offdiagonal elements of the temporal graph variation . The first term in (15) corresponds to the smoothness term in the static case (12), and quantifies the smoothness of the signals on timevarying graphs. The function in the second term is a regularization to avoid obtaining trivial solutions. This function, of which examples will be given later, depends on assumptions made about the graph model. The parameter controls the regularization strength.
We reformulate the problem (15) with a constraint (16) by using weighted adjacency matrices. As shown in the following, this leads to a tractable formulation because using weighted adjacency matrices allows us to simplify the condition for the valid set. A problem equivalent to that of (15) is then given by:
(17) 
where the valid set of weighted adjacency matrices is defined by
(18) 
The first term of (17) is equivalent to , given that [17]:
(19) 
where is the pairwise distance matrix defined by
(20) 
In this paper, we solve the optimization problem in (17) to learn a timevarying graph. Throughout this paper, we set the regularization for the weighted adjacency matrix to be:
(21) 
where and are the parameters. The first term in (21) forces the degree on each node to be positive without preventing edge weights from becoming zero. The second term controls the sparsity of the resulting graphs. Lower value leads to sparser graphs, and higher value leads to denser graphs. Next, we describe how to select the regularization for the graph temporal variation and solve the optimization problem.
Iv Learning TimeVarying Graph with Temporal Variation Regularization
Iva Regularization of Graph Temporal Variation
The choice of regularization for the graph temporal variation should be based on our a priori knowledge of temporal graph evolution. In this paper, we consider two types of graph evolutions, which appear in many important applications, and define associated regularization parameters in each case.
IvA1 Formulation with Fused Lasso
In the first model, we assume that most edges and their weights are likely to remain unchanged over a shortterm time horizon (P1). Thus, we select a regularizer in (17) and formulate the optimization problem as:
(22) 
A penalty on norm of the difference between neighboring time windows leads to a fused Lasso problem [37], which encourages sparse differences, leading to local temporal constancy of weighted adjacency matrices, and thus, solution graphs that tend to satisfy the (P1) property.
In order to make the optimization problem tractable, we rewrite (22) and (21) in vector form, with and replaced by and respectively. Only the uppertriangular parts of and are considered given that the graph is undirected.
Let us define , , and where . We also introduce the linear operators and that satisfy and respectively, where .
IvA2 Formulation with Group Lasso
The second model aims at learning graphs having the (P2) property. This is achieved by choosing as the regularizer in (17), leading to a group Lasso [38] penalty as follows:
(25) 
where only the last term differs from (22). Group Lasso penalizes the sum of the norm, leading to a norm that encourages sparsity in each group. In this formulation, the temporal variation of the weighted adjacency matrices at a certain time slot is regarded as one group. Thus the group Lasso yields graphs whose edge weights can vary significantly at a few time slots while they remain almost constant at all the other time slots, thus satisfying the (P2) property.
In order to make the optimization problem tractable, we rewrite (25) in the same manner as in the previous model as:
(26) 
where , .
IvB Optimization
We solve (22) and (25) with a primaldual splitting (PDS) method [31], which solves a problem in the form,
(27) 
where is a differentiable convex function with gradient and Lipschitz constant ; and are proper lower semicontinuous convex functions which are proximable; and is a linear operator.
Many algorithms to solve (27) have been proposed [31, 39, 40]. In this proposed method, we use a forwardbackwardforward (FBF) approach [41], which makes it possible to compute the proximal operators of functions and in parallel.
IvB1 Fused Lasso
By introducing the indicator function of in (24), we rewrite (23) as follows:
(28) 
where the indicator function is defined by
(29) 
The objective function (28) further reduces to the applicable form of the PDS as follows:
(30) 
where the dual variable is .
The proximal operator of is given by
(31) 
Because is separable across variables, the proximal operator can be computed separately for each term (see (6)). The proximal operator for the log barrier function [36] as the first term of is given by
(32) 
The proximal operator for the norm, i.e., the second term in , is well known to be the elementwise softthresholding operation [36]:
(33) 
See Algorithm 1 for the complete algorithm.
IvB2 Group Lasso
Similarly, the second objective function (26) can be reduced to the applicable form of the PDS by replacing with
(34) 
where The proximal operator of the group Lasso in is well known to be the blockwise softthresholding operation [36] :
(35) 
Because the proximal operator of the functions and can be computed efficiently, the second model can also be solved with the PDS algorithm. See Algorithm 1.
The computational complexity for both of our learning problems is per iteration. Because the optimization problem in the proposed method is convex and lowersemicontinuous, convergence of the algorithms is guaranteed.
V Experimental Results with Synthetic Dataset
Method  Approach  Temporal Regularization  
Static  SGLGLasso [19]  Graphical Lasso based model   
SGLSmooth [17]  Signal smoothness model    
TimeVarying  TGLTikhonov [33]  Signal smoothness + Regularization  Tikhonov regularization 
TGLFLasso (proposed)  Signal smoothness + Regularization  Fused Lasso  
TGLGLTV (proposed)  Signal smoothness + Regularization  Group Lasso 
In this section, we present the experimental results with synthetic data and compare the performance of our proposed method with related graph learning methods (see Table II for a summary of all the methods we compare).
Va Synthetic Datasets
We create three types of synthetic graph signals generated from timevarying graphs: A timevarying graph based on random waypoint model (abbreviated as TVRW graph) [42], a timevarying Erdős–Rényi graph (TVER graph), and the ER graph that gives large fluctuations at a few time slots (LFER graph).
The dataset construction consists of two steps:

Constructing a timevarying graph.

Generating timevarying graph signals from probability distributions based on graph Laplacians of the created timevarying graph.
We then describe the details of three datasets by referring to the desired properties (P1) and (P2) described in Section I.
VA1 TimeVarying Graph Construction
The TVRW graph is constructed in two steps. First, we simulate the RW model to obtain the timevarying coordinates over time. The model simulates some sensors moving around a square space of with random speed and directions. We obtain the time varying data corresponding to the position of each sensor at any given time. The number of sensors is set to and the motion speeds is set in the range of – m/s. We sample sensor positions every 0.1 s for a total of 300 observations. In the second step, we construct a nearestneighbor graph at each time from the position data. These edge weights are given by
(36) 
where is the Euclidean distance between nodes and , and is a parameter. The edge weights in this graph generally vary smoothly as in (36); however, some edges will (dis)appear in the next time slot due to the nature of the neighborhood graph. Note that our method estimates timevarying graphs robustly for various . This graph partially satisfies (P1) because the topology mostly remains unchanged in a short time horizon while the edge weights change smoothly over time.
The TVER graph is constructed by varying edges in an ER graph over time. We first construct an initial static ER graph with and an edge connection probability . The obtained graph satisfies (P1). The th graph is obtained by resampling 5% of edges from the previous graph
. In this way, we construct a set of graphs that varies over 300 time slots. The edge weights on this graph are set to be uniformly distributed in the interval
. Once they are randomly selected, they remain fixed until the edges are resampled. In this graph, only a few edges switch at any given time while most of the edges remain unchanged, i.e., this graph also follows (P1).The LFER graph is a simulated example of the timevarying graph with (P2). It is often observed in temporal brain connectivity dynamics [43, 44]. First, we generate six connectivity states, i.e., six static graphs. Each of the graph is an ER graph with and an edge connection probability .^{1}^{1}1In our preliminary experiments, hardly affects the graph learning performances. Second, the initial state is selected randomly from the six states. Its state remains unchanged with a 98% probability and changes to another connectivity state with the 2% probability at each time. This graph follows the (P2) property. An LFER is constructed in this way and we have 300 observations in time.
VA2 Generating Data Samples
We create data samples from the constructed timevarying graphs in the same manner for all the graphs. Let be the graph Laplacian corresponding to . A data sample is generated from a Gaussian Markov random field model defined by
(37) 
where is the covariance of the Gaussian noise. We set in this experiments. Pairs of variables generated from this distribution have closer values to each other when the corresponding nodes connect with a larger edge weight.
VB Performance Comparison
VB1 Experimental Conditions
We evaluate the performance in terms of relative error and Fmeasure, each averaged over all the time slots. Relative error is given by
(38) 
where is the estimated weighted adjacency matrix, and is the ground truth. It reflects the accuracy of edge weights on the estimated graph. The Fmeasure is given by
(39) 
where the true positive (tp) is the number of edges that are included both in and , the false positive (fn) is the number of edges that are not included in but are included in , and the false positive (fp) is the number of edges that are included in but are not included in
. The Fmeasure, which is the harmonic average of the precision and recall, represents the accuracy of the estimated graph topology. The Fmeasure takes values between 0 and 1. The higher the Fmeasure is, the higher the performance of capturing graph topology is.
In this experiment, our goal is to compare all the methods based on their best achievable performance. We perform MonteCarlo simulations for each method to find the parameters minimizing relative error. Note that as shown in the next section, fixed parameters still work well for our graph learning with real dataset. For SGLSmooth, TGLTikhonov, TGLFLasso, and TGLGLTV, is selected by finetuning, and is selected from the following set:
(40) 
where . The parameter for TGLFLasso and TGLGLTV is selected from 0.1 to 2 in steps of 0.1. The parameter for SGLGLasso is selected by the method described in [19]. The tolerance value in Algorithm 1 is set to . We evaluate the performance for each method with the different number of data samples and measure the average of the relative error and the Fmeasure over all the time frames. Note that SGLGLasso is not evaluated with because the sample covariance used in SGLGLasso cannot be calculated.
VB2 Results
Fig. 2 shows the performance comparisons according to . Figs. 2 and (d) show the average Fmeasure and the relative error for TVRW graphs; Figs. 2 and (e) show those for TVER graphs; Figs. 2 and (f) show those for LFER graphs.
As shown in Fig. 2, the existing static graph learning methods, i.e., SGLGLasso and SGLSmooth, present worse performance than the timevarying methods. This is because SGLSmooth learns a graph from each time slot independently. It does not consider the temporal relation of graphs. Note that SGLGLasso sometimes presented a comparable performance with timevarying methods when is large.
Fig. 2 and (d) present that TGLFLasso outperforms TGLTikhonov despite the fact that the edges in a TVRW graph vary smoothly in this case. This would be because TGLSmooth yields undesirable edges (discussed later in this section). The TGLFLasso significantly outperforms the other methods for TVER graphs as shown in Figs. 2 and (d). This indicates that the fused Lasso constraint on the difference between graphs in neighboring time slots works well.
It is also worth noting that, thanks to the regularization term, TGLFLasso can successfully learn accurate timevarying graphs from a smaller number of measurements. As can be seen in Fig. 2, the performance gain for TVER graphs is much better than that of TVRW graphs. This is because the TVER graphs have the (P1) property while the TVRW graphs lack a part of (P1), as previously mentioned.
Focusing on Fig. 2 and (f), TGLGLTV also outperforms the static graph learning methods for small . Fig. 3 shows the norm of the temporal difference of learned adjacency matrices in each time slots, i.e., . Clearly, it can be observed that TGLGLTV can detect large fluctuations of the graph in time slots where the connectivity state changes, while SGLGLasso and SGLSmooth have several unclear jumps. As a result, TGLGLTV is the graph learning method suitable for change detection, and the group Lasso of temporal variation is effective in graph learning.
Fig. 4 visualizes the temporal variation of a part of edge weights on the learned graphs from the datasets of TVRW and TVER graphs with . The vertical and horizontal axis of these figures represent the edge and time slot indices of the learned graph, and the color represents the intensity of the edge weights. To make an easy view, they visualize the first 50 edge indices. As can be seen in Fig. 4, SGLGLasso and SGLSmooth cannot capture the original graph structure, and they output the graphs ignoring its temporal relations. TGLTikhonov is slightly superior to the static graph learning methods. However, it yields many undesirable edges and the edge weights are relatively small. In contrast, TGLFLasso estimated the original structure better than the other methods and outputs the graph considering its temporal relationships.
Fig. 5 also demonstrates the temporal variation for the datasets of the LFER graph with . It can be observed that SGLGLasso and SGLSmooth cannot capture the large fluctuations in the graph. In contrast, TGLGLTV detects large edge changes in the original timevarying graph.
VC Effect of Temporal Resolution
To verify the robustness of the proposed method against the temporal graph transition, i.e., temporal resolution, we evaluate the performance for the dataset of the TVRW graph with different sampling periods . Table III shows the performance comparisons with the dataset of TVRW graph with according to . TGLFLasso outperforms other methods when the sampling period is small. On the other hand, TGLFLasso and TGLTikhonov have comparable performance for the average relative error with , and TGLFLasso gets worse performance than TGLTikhonov when gets large. This is because many edge weights of graphs may vary over time when the sampling period is long, i.e., it does not satisfy (P1).
Average Fmeasure  
Sampling period  0.1  0.5  1.0  1.5  2.0 
SGLGLasso  0.613  0.574  0.574  0.580  0.575 
SGLSmooth  0.639  0.562  0.562  0.563  0.534 
TGLTikhonov  0.681  0.606  0.613  0.611  0.596 
TGLFLasso  0.718  0.636  0.656  0.664  0.627 
Average Relative Error  
Sampling period  0.1  0.5  1.0  1.5  2.0 
SGLGLasso  0.602  0.690  0.699  0.697  0.696 
SGLSmooth  0.612  0.683  0.692  0.685  0.731 
TGLTikhonov  0.529  0.606  0.607  0.605  0.633 
TGLFLasso  0.431  0.574  0.609  0.656  0.701 
VD Computation Time
We compare the computation time of the proposed methods with some related works. All methods are implemented in MATLAB R2018b and run on a 2.3GHz Intel Xeon W processor with 128GB RAM. The experiments are tested on the TVER graph dataset for different number of nodes . The tolerance value in each methods is set to .
Fig. 6 shows the computation time of each method, demonstrating that the proposed methods (TGLFLasso and TGLGLTV) converge faster than SGLGLasso. Because SGLGLasso needs to solve nonnegative quadratic programs as the subproblem in each iteration, it requires very significant computation time. The computational complexity of one iteration in SGLGLasso is complexity where is the worstcase complexity of solving the subproblem, and . On the other hand, the complexity the iteration in TGLFLasso and TGLGLTV is complexity as described in Section IV.
TGLFLasso and TGLGLTV required longer computation time than SGLSmooth and TGLTikhonov, where they estimate a graph using the almost same algorithm; that is, they have the same computational complexity . However, TGLFLasso and TGLGLTV introduce dual variables to solve the optimization problem with temporal variation regularization, i.e., they have more variables to update than SGLSmooth and TGLTikhonov. Although TGLTikhonov also has temporal variation regularization, it can be computed without introducing dual variable because its temporal variation regularization can be differentiable.
Vi Real Data Applications
Via Denoising of Dynamic Point Clouds
dog  handstand  skirt  wheel  

noisy  12.12  13.97  13.73  15.45 
NN  12.35  14.29  14.01  15.76 
SGLSmooth  13.13  15.37  15.03  16.74 
TGLTikhonov  13.34  15.69  15.34  17.06 
TGLFLasso  20.05  21.32  21.58  22.97 
Our proposed method is applied to dynamic point cloud data denoising. Dynamic point cloud data contains 3D coordinates of dynamically evolving points. When point cloud data are acquired, the measurement error leads to the displacements of the geometric coordinates of the point clouds. Here, we consider graphbased denoising approaches. The performance of noise removal depends on the underlying graph. In this experiment, denoising is performed by using graph heat kernel filtering [45]. The timevarying graph used in the denoising is estimated from noisy point cloud data.
We use the dynamic point cloud dataset in [46], which contains five dynamic point clouds: dance, dog, handstand, skirt, and wheel. As this dataset is clean data, with position ranges from to , we added Gaussian noise with , which is a significantly higher noise level. The timevarying graph is learned from the dataset downsampled to 357 points and evolving over 240 time slots. In this experiment, we use fixed parameters for each method, which is determined by a grid search with dance data, and evaluate the denoising performance with the other four data.
Table IV summarizes the dynamic point cloud denoising qualities. Timevarying graph learning methods, TGLTikhonov and TGLFLasso, show better results than SGLSmooth and nearest neighbor. This indicates that the nearest neighbor cannot construct the meaningful graph from noisy data. Additionally, TGLFLasso outperformed TGLTikhonov up to 6 dB.
Fig. 7 shows the visualization of denoising results of the wheel at a certain time. Similar to the numerical performance, the nearest neighbor cannot capture the structure of the human body. SGLSmooth and TGLTikhonov yield slightly better outputs than that by the nearest neighbor; however, the arms and legs are still problematic. On the other hand, TGLFLasso can successfully capture the structure of the human body than the other methods.
Fig. 8 visualizes a graph at a certain time in the timevarying graph estimated from the noisy wheel data in the dynamic point cloud dataset. In this figure, the nodes in the graphs are plotted in the correct position for visualization. As shown in Fig. 8, nearest neighbor yields a sparse graph but nodes are connected without a temporal relationship. SGLSmooth and TGLTikhonov yielded very dense edges along with connecting different parts in the body. In contrast, TGLFLasso yields the graph whose nodes are connected accurately and preserves the human body structure.
ViB Application to Temperature Data
Finally, we apply the proposed method to estimate a timevarying graph from temperature data in the island of Hokkaido, the northernmost region in Japan. The goal of this experiment is to learn a timevarying graph to explore the relationship between geographical regions over time. In this experiment, we use the average daily temperature data^{2}^{2}2The Japan Meteorological Agency provided the daily temperature data from their website at https://www.jma.go.jp/jma/index.html collected from 172 locations in Hokkaido in 2015, i.e., overall the dataset has 365 time slots with and . From this data, we estimate a timevarying graph by using TGLFLasso.
Fig. 9 shows the graph learned from temperature data^{3}^{3}3The Geospatial Information Authority of Japan provided the map in Hokkaido with altitude from their website at http://www.gsi.go.jp/. From the figure, the inferred characteristics of the learned graph are as follows:

Nodes close to each other are basically connected. This is reasonable because the temperature is similar in the same geographical area. Note that, if the recording points are separated by mountain ranges, nodes may not be connected even if the distances are short.

Nodes along the coast are often connected to each other, and the inland nodes are also connected to each other. Especially, coastal nodes may connect to each other despite being far away.

Locally connected structures in the learned graph mostly remain unchanged over time as shown in Fig. 9.
Figs. 9 and 9 show the seasonspecific graphs. We have found that a coastal node often has a connection to another coastal node even if they are not close to each other. This can be justified by the warming or cooling effect of sea currents that occur seasonally in this area and that affect coastal areas in similar ways. Fig. 10 represents daily sea surface temperatures (SST) on January 8 and August 9, 2015 ^{4}^{4}4The daily SST was provided by Japan Meteorological Agency, from their website at https://www.jma.go.jp/jma/index.html. As can be seen in Figs. 9, 9, and 10, nodes in similar SST areas are connected to each other in the learned graph. It is important to emphasize that there was no prior information about the SST areas used in the graph learning process.
Vii Conclusion
In this paper, we have presented a framework for learning timevarying graphs suitable for analysis of spatiotemporal data and change detection. The framework formulates a timevarying graph learning problem as convex optimization problems with the constraints on the temporal variation in the graph. Specifically, the framework introduces fused Lasso regularization and group Lasso regularization of the temporal variation to learn the graph having temporal homogeneity and one reflecting sudden change in networks respectively. We also propose algorithms solving the optimization problems based on the primaldual splitting algorithm. In the applications on the synthetic and real data, we demonstrated that our proposed model can successfully learn the timevarying graphs, especially under the condition that the number of observations is small. Our future work includes implementing an automatic parameter tuning method and extending the algorithm for online graph learning.
References
 [1] C. Nowzari, V. M. Preciado, and G. J. Pappas, “Analysis and control of epidemics: A survey of spreading processes on complex networks,” IEEE Control Syst. Mag., vol. 36, no. 1, pp. 26–46, 2016.
 [2] A. Ganesh, L. Massoulie, and D. Towsley, “The effect of network topology on the spread of epidemics,” in Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies., vol. 2, 2005, pp. 1455–1466 vol. 2.
 [3] V. Colizza, A. Barrat, M. Barthélemy, and A. Vespignani, “The role of the airline transportation network in the prediction and predictability of global epidemics,” PNAS, vol. 103, no. 7, pp. 2015–2020, 2006.
 [4] J. Zhang and J. M. F. Moura, “Diffusion in social networks as SIS epidemics: Beyond full mixing and complete graphs,” IEEE J. Sel. Top. Signal Process., vol. 8, no. 4, pp. 537–551, 2014.
 [5] N. Shahid, N. Perraudin, V. Kalofolias, G. Puy, and P. Vandergheynst, “Fast robust PCA on graphs,” IEEE J. Sel. Top. Signal Process., vol. 10, no. 4, pp. 740–756, 2016.
 [6] F. Shang, L. C. Jiao, and F. Wang, “Graph dual regularization nonnegative matrix factorization for coclustering,” Pattern Recognition, vol. 45, no. 6, pp. 2237–2250, 2012.
 [7] Z. Zhang and K. Zhao, “Lowrank matrix approximation with manifold regularization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 7, pp. 1717–1729, 2013.
 [8] A. Gadde, S. K. Narang, and A. Ortega, “Bilateral filter: Graph spectral interpretation and extensions,” in Proc. IEEE Int. Conf. Image Process., 2013, pp. 1222–1226.
 [9] Y. Wang, A. Ortega, D. Tian, and A. Vetro, “A graphbased joint bilateral approach for depth enhancement,” in Proc. IEEE Int. Conf. Acoust. Speech. Signal Process., 2014, pp. 885–889.
 [10] M. Onuki, S. Ono, M. Yamagishi, and Y. Tanaka, “Graph signal denoising via trilateral filter on graph spectral domain,” IEEE Trans. Signal Inf. Process. Netw., vol. 2, no. 2, pp. 137–148, 2016.
 [11] A. Elmoataz, O. Lezoray, and S. Bougleux, “Nonlocal discrete regularization on weighted graphs: A framework for image and manifold processing,” IEEE Trans. Image Process., vol. 17, no. 7, pp. 1047–1060, 2008.
 [12] C. Couprie, L. Grady, L. Najman, J. Pesquet, and H. Talbot, “Dual constrained TVbased regularization on graphs,” SIAM J. Imaging Sci., vol. 6, no. 3, pp. 1246–1273, 2013.
 [13] S. Ono, I. Yamada, and I. Kumazawa, “Total generalized variation for graph signals,” in Proc. IEEE Int. Conf. Acoust. Speech. Signal Process., 2015, pp. 5456–5460.
 [14] X. Dong, D. Thanou, M. Rabbat, and P. Frossard, “Learning graphs from data: A signal representation perspective,” IEEE Signal Process. Mag., vol. 36, no. 3, pp. 44–63, 2019.
 [15] G. Mateos, S. Segarra, A. G. Marques, and A. Ribeiro, “Connecting the dots: Identifying network structure via graph signal processing,” IEEE Signal Process. Mag., vol. 36, no. 3, pp. 16–43, 2019.
 [16] G. B. Giannakis, Y. Shen, and G. V. Karanikolas, “Topology identification and learning over graphs: Accounting for nonlinearities and dynamics,” Proc. IEEE, vol. 106, no. 5, pp. 787–807, 2018.
 [17] V. Kalofolias, “How to learn a graph from smooth signals,” in Proc. Int. Conf. Artif. Intell. Statist., 2016, pp. 920–929.
 [18] X. Dong, D. Thanou, P. Frossard, and P. Vandergheynst, “Learning Laplacian matrix in smooth graph signal representations,” IEEE Trans. Signal Process., vol. 64, no. 23, pp. 6160–6173, 2016.
 [19] H. E. Egilmez, E. Pavez, and A. Ortega, “Graph learning from data under Laplacian and structural constraints,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 6, pp. 825–841, 2017.
 [20] B. Pasdeloup, V. Gripon, G. Mercier, D. Pastor, and M. G. Rabbat, “Characterization and inference of graph diffusion processes from observations of stationary signals,” IEEE Trans. Signal Inf. Process. Netw., vol. 4, no. 3, pp. 481–496, 2018.
 [21] D. Thanou, X. Dong, D. Kressner, and P. Frossard, “Learning heat diffusion graphs,” IEEE Trans. Signal Inf. Process. Netw., vol. 3, no. 3, pp. 484–499, 2017.
 [22] J. Mei and J. M. F. Moura, “Signal processing on graphs: Causal modeling of unstructured data,” IEEE Trans. Signal Process., vol. 65, no. 8, pp. 2077–2092, 2017.
 [23] S. P. Chepuri, S. Liu, G. Leus, and A. O. Hero, “Learning sparse graphs under smoothness prior,” in Proc. IEEE Int. Conf. Acoust. Speech. Signal Process., 2017, pp. 6508–6512.
 [24] E. Pavez, H. E. Egilmez, and A. Ortega, “Learning graphs with monotone topology properties and multiple connected components,” IEEE Trans. Signal Process., vol. 66, no. 9, pp. 2399–2413, 2018.
 [25] H. E. Egilmez, E. Pavez, and A. Ortega, “Graph learning from filtered signals: Graph system and diffusion kernel identification,” IEEE Trans. Signal Inf. Process. Netw., pp. 1–1, 2018.
 [26] M. G. Preti, T. A. Bolton, and D. Van De Ville, “The dynamic functional connectome: Stateoftheart and perspectives,” NeuroImage, vol. 160, pp. 41–54, 2017.
 [27] Y. Kim, S. Han, S. Choi, and D. Hwang, “Inference of dynamic networks using timecourse Data,” Brief. Bioinformatics, vol. 15, no. 2, pp. 212–228, 2014.
 [28] D. Hallac, Y. Park, S. Boyd, and J. Leskovec, “Network inference via the timevarying graphical Lasso,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 205–213.
 [29] R. P. Monti, P. Hellyer, D. Sharp, R. Leech, C. Anagnostopoulos, and G. Montana, “Estimating timevarying brain connectivity networks from functional MRI time series,” NeuroImage, vol. 103, pp. 427–443, 2014.
 [30] J. Wang, S. Qiu, Y. Xu, Z. Liu, X. Wen, X. Hu, R. Zhang, M. Li, W. Wang, and R. Huang, “Graph theoretical analysis reveals disrupted topological properties of whole brain functional networks in temporal lobe epilepsy,” Clinical Neurophysiology, vol. 125, no. 9, pp. 1744–1756, 2014.
 [31] L. Condat, “A primal–dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms,” J Optim Theory Appl, vol. 158, no. 2, pp. 460–479, 2013.
 [32] K. Yamada, Y. Tanaka, and A. Ortega, “Timevarying graph learning based on sparseness of temporal variation,” in Proc. IEEE Int. Conf. Acoust. Speech. Signal Process., 2019, pp. 5411–5415.
 [33] V. Kalofolias, A. Loukas, D. Thanou, and P. Frossard, “Learning time varying graphs,” in Proc. IEEE Int. Conf. Acoust. Speech. Signal Process., 2017, pp. 2826–2830.
 [34] B. Baingana and G. B. Giannakis, “Tracking switched dynamic network topologies from information cascades,” IEEE Trans. Signal Process., vol. 65, no. 4, pp. 985–997, 2017.

[35]
D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending highdimensional data analysis to networks and other irregular domains,”
IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98, 2013.  [36] N. Parikh and S. Boyd, “Proximal algorithms,” Found Trends Optim, vol. 1, no. 3, pp. 127–239, 2014.
 [37] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, “Sparsity and smoothness via the fused Lasso,” J. R. Stat. Soc. Ser. B Stat. Methodol., vol. 67, no. 1, pp. 91–108, 2005.

[38]
L. Meier, S. V. D. Geer, and P. Bühlmann, “The group Lasso for logistic regression,”
J. R. Stat. Soc. Ser. B Stat. Methodol., vol. 70, no. 1, pp. 53–71, 2008.  [39] N. Komodakis and J. Pesquet, “Playing with duality: An overview of recent primaldual approaches for solving largescale optimization problems,” IEEE Signal Process. Mag., vol. 32, no. 6, pp. 31–54, 2015.
 [40] I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Commun. Pure Appl. Math., vol. 57, no. 11, pp. 1413–1457, 2004.
 [41] P. L. Combettes and J.C. Pesquet, “Primaldual splitting algorithm for solving inclusions with mixtures of composite, Lipschitzian, and parallelsum type monotone operators,” SetValued Anal, vol. 20, no. 2, pp. 307–330, 2012.
 [42] D. B. Johnson and D. A. Maltz, “Dynamic source routing in ad hoc wireless networks,” in Mobile Computing, 1996, pp. 153–181.
 [43] E. A. Allen, E. Damaraju, S. M. Plis, E. B. Erhardt, T. Eichele, and V. D. Calhoun, “Tracking wholebrain connectivity dynamics in the resting state,” Cereb. Cortex, vol. 24, no. 3, pp. 663–676, 2014.
 [44] J. Taghia, S. Ryali, T. Chen, K. Supekar, W. Cai, and V. Menon, “Bayesian switching factor analysis for estimating timevarying functional connectivity in fMRI,” Neuroimage, vol. 155, pp. 271–290, 2017.
 [45] F. Zhang and E. R. Hancock, “Graph spectral image smoothing using the heat kernel,” Pattern Recogn, vol. 41, no. 11, pp. 3328–3342, 2008.
 [46] J. Gall, C. Stoll, E. de Aguiar, C. Theobalt, B. Rosenhahn, and H. Seidel, “Motion capture using joint skeleton tracking and surface estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 1746–1753.
Comments
There are no comments yet.