A number of applications involve data that admit a natural representation in terms of node attributes over social, economic, sensor, communication, and biological networks, to name a few [24, 11]. An inference task that emerges in this context is to predict or extrapolate the attributes of all nodes in the network given the attributes of a subset of them. In a finance network, where nodes correspond to stocks and edges capture dependencies among them, one may be interested in predicting the price of all stocks in the network knowing the price of some. This is of paramount importance in applications where collecting the attributes of all nodes is prohibitive, as is the case when sampling large-scale graphs, or, when the attribute of interest is of sensitive nature, such as the transmission of HIV in a social network. This task was first formulated as reconstructing a time-invariant function on a graph [24, 25].
Follow-up reconstruction approaches leverage the notions of graph bandlimitedness , sparsity and overcomplete dictionaries , smoothness over the graph [25, 12], all of which can be unified as approximations of nonparametric graph functions drawn from a reproducing kernel Hilbert space (RKHS) ; see also  for semi-parametric alternatives.
In various applications however, the network connectivity and node attributes change over time. Such is the case in e.g. a finance network, where not only the stock prices change over time, but also their inter-dependencies. Hence, maximizing reconstruction performance for these time-varying signals necessitates judicious modeling of the space-time dynamics, especially when samples are scarce.
Inference of time-varying graph functions has been so far pursued mainly for slow variations [28, 14, 9]. Temporal dynamics have been modeled in  by assuming that the covariance of the function to be reconstructed is available. On the other hand, spatio-temporal reconstruction of generally dynamic graphs has been approached using an extended graph kernel matrix model with a block tridiagonal structure that lends itself to a computationally tractable iterative solver . However,  neither relies on a dynamic model of the function variability, nor it provides a tractable method to learn the “best” kernel that fits the data. Furthermore,  and  do not adapt to changes in the spatio-temporal dynamics of the graph function.
The present paper fills this gap by introducing online estimators for time-varying functions on generally dynamic graphs. Specifically, the contribution is threefold.
A deterministic model for time-varying graph functions is proposed, where spatial dynamics are captured by the network connectivity while temporal dynamics are described through a graph-aware state-space model.
Based on this model, an algorithm termed kernel kriged Kalman filter (KeKriKF) is developed to obtain function estimates by minimizing a kernel ridge regression (KRR) criterion in an online fashion. The proposed solver generalizes the traditional network kriged Kalman filter (KriKF)[17, 16, 29], which relies on a probabilistic model. The novel estimator forgoes with assumptions on data distributions and stationarity, by promoting space-time smoothness through dynamic kernels on graphs.
To select the most appropriate kernel, a multi-kernel (M)KriKF is developed based on the multi-kernel learning (MKL) framework. This algorithm adaptively selects the kernel that “best” fits the data dynamics within the linear span of a prespecified kernel dictionary. The structure of Laplacian kernels is exploited to reduce complexity down to the order of KeKriKF. This complexity is linear in the number of time samples, which renders KeKriKF and MKriKF appealing for online operation.
The rest of the paper is structured as follows. Sec. II contains preliminaries and states the problem. Sec. III introduces the spatio-temporal model and develops the KeKriKF. Sec. IV endows the KeKriKF with an MKL module to obtain the MKriKF. Finally, numerical experiments and conclusions are presented in Secs. V and VI, respectively.
Scalars are denoted by lowercase, column vectors by bold lowercase, and matrices by bold uppercase letters. Superscriptsand respectively denote transpose and pseudo-inverse; stands for the all-one vector; corresponds to a diagonal matrix with the entries of on its diagonal, while is a vector holding the diagonal entries of ; and
a Gaussian distribution with mean
and variance. Finally, if is a matrix and a vector, then and .
Ii Problem statement and preliminaries
Consider a time-varying graph , where denotes the vertex set, and the adjacency matrix, whose -th entry is the nonnegative weight of the edge connecting vertices and at time . The edge set is , and two vertices and are connected at time if . The graphs in this paper are undirected and have no self-loops, which means that and , . The Laplacian matrix is , and is positive semidefinite provided that , ; see Sec. II-A.
A time-varying graph function is a map , where is the set of time indices. Specifically, represents the value of the attribute of interest at node and time , e.g. the closing price of the -th stock on the -th day. Vector collects the function values at time .
Suppose that noisy observations , , are available at time , where contains the indices of the sampled vertices, and captures the observation error. With and , the observation model in vector-matrix form is
where selects the sampled entries of .
Given , , and for , the goal of this paper is to reconstruct at each . The estimators should operate in an online fashion, which means that the computational complexity per time slot must not grow with . Observe that no statistical information is assumed available in our formulation.
Ii-a Kernel-based reconstruction
Aiming ultimately at the time-varying , it is instructive to outline the kernel-based reconstruction of a time-invariant given , and using samples , where and .
Relying on regularized least-squares (LS), we obtain
where and the regularizer promotes estimates with a certain structure. For example, the so-called Laplacian regularizer promotes smooth function estimates with similar values at vertices connected by strong links (large ), since is small when is smooth. It turns out that ; see e.g. [11, Ch. 2]. For a scalar function a general graph kernel family of regularizers is obtained as , where
|Diffusion kernel |
|-step random walk ||,|
|Regularized Laplacian[25, 30, 24]|
and is termed a Laplacian kernel. Clearly, subsumes for