, has become a hot research topic since that the prior graph is usually unavailable for graph-based models in many applications, e.g., graph neural networks. Besides the statistical models [7, 22], graph signal processing (GSP) [15, 18] also attempts to learn graphs from perspective of signal processing. Most notable GSP graph learning models are based on assumption of smoothness, under which signal values of two connected vertices with large edge weights tend to be similar . On the other hand, a typical feature existing in most models is that the environment is assumed to be static such that one can learn merely a single graph from all observed data. However, relationships between entities are usually time-varying in real world. Henceforth, learning a series of time-varying graphs with time stamps is a reasonable choice.
Current time-varying graph learning methods attempt to jointly learn graphs of all time slots by exploiting prior assumptions about evolutionary patterns of dynamic graphs . One may found that the most used assumptions here is temporal homogeneity , under which only a small number of edges are allowed to change between two consecutive graphs. The essence of prior assumptions is to establish temporal relations between graphs of different time slots using prior knowledge, which are paramount to learning time-varying graphs since they actually bring structure information, in addition to data, to learning process.
Albeit interesting, assumption of temporal homogeneity only cares about relations between graphs in neighboring time slots and treat them equally. Therefore, the assumed temporal structure of this prior can be depicted by a chain. As shown in Fig.1, the graphs of each time slots are nodes in chain structure, in which the edges indicate connections between nodes. The connections are interpreted as constraints on variations of the corresponding two graphs, i.e., for temporal homogeneity assumption, where is the adjacency matrix of . In chain structure, edges only occur between two neighbouring nodes and all edge weights equal to 1. Obviously, this structure is simple enough but it may be inconsistent with the real temporal relations in some applications. Here we take crowd flow networks of urban area as an example. The variations of crowd flow networks at different time periods in a day are not uniform due to the difference of travel behaviour . For example, the pattern of networks in early morning (0 a.m.–6 a.m.) are apparently different from those in rush hours (7 a.m.–9 a.m.). Thus, it is not reasonable to treat all connections equally. Furthermore, common knowledge tells us networks in the same time period of two different working days, e.g., 10 a.m. in Monday and Tuesday, are also similar. Capturing this periodic pattern is beyond the ability of chain structure.
To this end, a more general time-varying graph learning method using smooth signals should be proposed by generalizing the assumption of temporal homogeneity. In this paper, a flexible structure named temporal graph is leveraged to describe structured temporal relations of time-varying graphs. Different from chain structure, relations between graphs of any paired time slots can be established in temporal graph and we use weights to measure the ”closeness” of these relations. Therefore, temporal graph is more powerful than chain structure to represent temporal structures in real world. Clearly, chain structure is a special case of temporal graph. A distributed algorithm based on Alternating Direction Method of Multipliers (ADMM) will be developed to solve the induced optimization problem, which can save considerable time when the number of time periods is large. Numerical tests illustrate that our method outperforms the state-of-art methods in face of intricate temporal structures.
We will learn undirected graphs with non-negative weights. Given observed signals generated from , graph learning is aimed to infer the adjacency matrix of . Under smoothness priors, it is equivalent to solving the following problem 
where is Hadamard product and
is a vector with all entries equalling to 1. Furthermore,is the adjacency matrix set defined as
where is the set of nonnegative real numbers and is a vector with all entries equalling to 0. For data matrix , the pairwise distance matrix in (1) is defined as
where is the entry of . The first term of (1) is the smoothness of the observed signals over . Besides, the second term controls degrees of each node while the third term controls sparsity of edges , where both and are predefined parameters. Note that is a symmetric matrix with diagonal entries equaling to 0, and hence the number of free variables of is . We define a vector whose entries are the upper right variables of . Therefore, problem (1) can be rewritten as
where the linear operator satisfies and is the vector form of the upper right variables of .
Under these notations, the time-varying graph learning will produce a series of graphs using signals collected during time periods, where is the data matrix of time slot . Specifically, temporal homogeneity assumption based models can be formulated as
where is a global parameter controls the weight of temporal priors and is calculated using . The last term of (5) indicates a chain structured temporal relations.
3 Proposed Framework
Instead of chain structure in (5), we suggest a structure named temporal graph to describe temporal relations of time-varying graphs. The temporal graph is a graph structure whose nodes represent graphs of time slots and edges describe relationships between the graphs. As shown in Fig.2, temporal graph is undirected but with non-negative weighted edges. Any two nodes can be connected in temporal graph instead of allowing merely two consecutive graphs connection, e.g., in Fig.2. Furthermore, we use edge weights to measure the “closeness” of these connections and hence in Fig.2 are not forced to be equal as chain structure does. Obviously, temporal graph is more general and is able to describe the temporal structure in real world.
Formally, suppose temporal graph and is the node set containing all graphs. On the other hand, is the edge set containing connections between these graphs and suppose there are edges in , i.e., . Time varying graph learning using temporal graph is formulated as
where is the relative weights between the -th and -th time slot. Parameter is used to scale the weight of edge objectives relative to the node objectives in (6).
4 ADMM based algorithm
For the issue of computation, we adopt ADMM framework [14, 2, 3] to solve (6). For an edge , we first introduce a consensus variable of , denoted as . In fact, represents the connection starting from to . For the same edge, is the consensus variable of . With consensus variables, (6) is equivalent to the following problem
where denotes the set of all the nodes that are connected with node in . We define a matrix containing all primal variables. In addition, matrix of consensus variables and dual variables are also defined. For the -th edge in temporal graph , , the corresponding consensus variable vectors are the -th and -th columns of , respectively. This also holds true for matrix . The scaled form of augmented Lagrangian of (7) can then be written as
where is an ADMM penalty parameter . Following ADMM framework, we alternately update and .
1) Update : For , the update of is as follows
Obviously, we can update each separately,
If we let , where , (10) can be reformulated as
Now the problem (11) can be solved using primal dual splitting algorithm proposed in . In this paper, we use projected gradient descent (PGD) algorithm  to solve problem (11). The gradient of the objective function of (11) is as follows
is an element-wise reciprocal operator. We use a dummy variableand set , then iteratively update until it converges to with a certain precision.
where , is the number of iterations of PGD algorithm and the step size. After obtaining the solution of (11), we set . Note that all can be updated in parallel.
2) Update : For each edge , we can update the corresponding column vectors of as follows
with which (14) can be solved by
where is the proximal operator of function . However, we have no knowledge of the closed form of the operator . Hence another property of proximal operators mentioned by  might be introduced here.
If a function , and , where is an identity matrix, then
is an identity matrix, then
In our problem, norm, ,
is zero matrix and. According to Property 1, the following update can be easily reached for (16),
Now each column of can be updated in parallel.
3) Update : For each edge , the corresponding column of can be updated by
This update might also be performed in parallel.
In summary, our algorithm can be implemented in a distributed fashion since and can all be updated in parallel. The global convergence is also guaranteed by ADMM framework since (6) is a convex problem. Furthermore, the stopping criterion is that the primal and dual residuals are both below the tolerance. The details are referred in .
5 Numerical Experiments
5.1 Experimental Setup
We test our framework on the temporal structure shown in Fig.3. It is a non-chain structure where is connected with instead of . To obtain time-varying graphs, an initial RBF graph with 20 vertices is generated in the same way as . After that, is obtained by changing edges in randomly and the number of the changed edges is inverse proportion with the edge weights of in Fig.3. Following this way, we can generate other graphs sequentially. We should emphasis that is generated based on . Smooth graph signals of each are generated by the same way introduced in 
. The evaluation metrics adopted are Matthews correlation coefficient ()  and relative error, each averaged over all time.
is a metric representing the accuracy of the estimated graph topology and its value is between -1 and 1 (-1 represents completely wrong detection while +1 means completely right detection). Relative error is defined as, where is the learned adjacency matrix and is the groundtruth. In our experiments, we fix and find the best by grid search . Furthermore, we choose that maximizes , which is 2.5. In ADMM framework, is set to be 0.5 and tolerance value is set to be (both relative and absolute tolerance) . The baselines are SGL (learn graphs of each time periods independently), TVGL-Tikhonov  and TVGL-Homogeneity .The last two are chain based models. All algorithms are implemented by python and run on an Intel(R) Xeon(R) CPU with 2.10GHz clock speed and 256GB of RAM.
5.2 Experimental Results
Figure 4 shows the performance of different data size of each time slots. We can observe that SGL reaches the worst performance since no temporal priors are exploited. The performance of TVGL-Tikhonov and TVGL-Homogeneity is inferior to ours due to that their chain structures fail to characterize the real temporal structure depicted in Fig.4. On the contrary, our framework is able to describe the non chain structure easily thanks to the strong representation ability of temporal graph. Therefore, our method is superior to other models when faced with intricate temporal structures.
The impact of scaling weight parameter is displayed in Fig.6. We can regard as a trade-off between using information from data and temporal priors when optimizing (6). As the increase of , the prior information of temporal structure is added to our model, resulting in improvement of performance. There exist an making our algorithm obtain the best performance. After that, the performance starts to decline and finally reaches a plateau. This is caused by the fact that too much weight is put on the second term of (6) when is larger than a specific value, which forces graphs of all time slots to be the same . This consensus graph will not change any more with the continued growth of . As such, the performance of our algorithm stay unchanged when is too large.
Figure 6 depicts the scalability of our algorithm. We fix and apply our method on chain temporal structure problems defined in . This is feasible since chain structure is a special case of our framework. We compare our algorithm with TVGL-homogeneity and implement our algorithm in a distributed way. Our code is run on different cores of a single machine and 25 cores are used. As Fig.6 shows, the running time of TVGL-homogeneity is significantly greater than ours. There is a significant increase for our algorithm when . This is caused by that the number of cores used is 25 and additional waiting time is required when .
In this paper, we propose a general time-varying graph learning framework, under which temporal graph is employed to describe temporal structures. A distributed algorithm using ADMM framework is developed to solve the induced optimization problem. Experimental results show that our framework outperform the state-of-art methods when facing complicated temporal structures.
-  (2020) Efficient graph learning from noisy and incomplete data. IEEE Transactions on Signal and Information Processing over Networks 6, pp. 105–119. Cited by: §1.
-  (2004) Convex optimization. Cambridge university press. Cited by: §4.
-  (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc. Cited by: §4, §4, §5.1.
-  (1987) Projected gradient methods for linearly constrained problems. Mathematical programming 39 (1), pp. 93–116. Cited by: §4.
-  (2016) Learning laplacian matrix in smooth graph signal representations. IEEE Transactions on Signal Processing 64 (23), pp. 6160–6173. Cited by: §1, §5.1.
-  (2019) Learning graphs from data: a signal representation perspective. IEEE Signal Processing Magazine 36 (3), pp. 44–63. Cited by: §1.
-  (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 (3), pp. 432–441. Cited by: §1.
-  (2015) Network lasso: clustering and optimization in large graphs. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 387–396. Cited by: §5.2.
-  (2017) Network inference via the time-varying graphical lasso. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 205–213. Cited by: §4.
-  (2017) Learning time varying graphs. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2826–2830. Cited by: §1, §5.1.
-  (2016) How to learn a graph from smooth signals. In Artificial Intelligence and Statistics, pp. 920–929. Cited by: §1, §2, §5.1.
-  (2015) Playing with duality: an overview of recent primal dual approaches for solving large-scale optimization problems. IEEE Signal Processing Magazine 32 (6), pp. 31–54. Cited by: §4.
-  (2019) Connecting the dots: identifying network structure via graph signal processing. IEEE Signal Processing Magazine 36 (3), pp. 16–43. Cited by: §1.
-  (2006) Numerical optimization. Springer Science & Business Media. Cited by: §4.
-  (2018) Graph signal processing: overview, challenges, and applications. Proceedings of the IEEE 106 (5), pp. 808–828. Cited by: §1.
-  (2014) Proximal algorithms. Foundations and Trends in optimization 1 (3), pp. 127–239. Cited by: §4.
-  (2020) Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061. Cited by: §5.1.
-  (2019) Introduction to graph signal processing. In Vertex-Frequency Analysis of Graph Signals, pp. 3–108. Cited by: §1.
-  (2017) Learning heat diffusion graphs. IEEE Transactions on Signal and Information Processing over Networks 3 (3), pp. 484–499. Cited by: §1.
-  (2020) A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems 32 (1), pp. 4–24. Cited by: §1.
-  (2019) Time-varying graph learning based on sparseness of temporal variation. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5411–5415. Cited by: §1, §5.1, §5.2.
-  (2007) Model selection and estimation in the gaussian graphical model. Biometrika 94 (1), pp. 19–35. Cited by: §1.