1 Introduction
Forecasting time processes is crucial in biology, finance, and position estimation, where historical information is exploited to predict future trends and subsequently take optimal decisions
[1] [2] [3]. When the process is multidimensional, forecasting methods should also account for the structure hidden in the data to improve accuracy. The canonical tool to exploit these relations is the vector autoregressive (VAR) recursion [4]. Conventional VAR models are nevertheless overparameterized; hence, requiring a large amount of training data. One way to reduce the number of parameters in VAR models is to express the process relations through a graph [5] [6]. In that case, the VAR model parameter matrices are substituted with graph filters leading to a number of trainable parameters that is independent of the process dimension as well as to a lower implementation complexity. As such, the socalled graph VAR model in [6] has led to a superior prediction accuracy compared to the classical VAR model.The graph VAR models each time series as a scalar signal evolving over the nodes of a graph. However, it does not account for situations where multiple time series are related to a single node. In a position estimation problem, where each node is a point in the 3D space, we have threedimensional processes evolving over each node. A similar problem is also encountered when predicting related quantities in weather monitoring sensor networks, e.g., temperature, humidity, air quality, and chemical concentrations. While the graph VAR model from [6] can be adopted here to predict the evolution of each time series separately, it will not account for the additional hidden relations in the multidimensional process at each node.
To account for this additional information and improve the prediction accuracy, we devise a product graph VAR model that at one hand captures the interrelations between the processes of different nodes, while on the other hand captures the intrarelations in the multidimensional process at a single node. Product graphs enjoy a wide popularity in modeling structured information in large structured data sets [7]. They have been used to aid sampling strategies for graph data [8] [9] [10], build graph wavelets on circulant graphs [11], represent a graph process as a timeinvariant graph signal on a larger graph [12] [13]
, and to build graph neural networks
[17]. In this work, we use product graphs to aid forecasting of multidimensional processes on graphs. One of the components in the product graph will model the interrelations between nodes, while the other will capture the intrarelations between features of a node. We subsequently build on the concept of product graph filters [19] to put forth a general parametric VAR model that can forecast multidimensional processes with a number of parameters that is independent of the graph and feature dimensions. The latter is also extended to a novel approach that parameterizes also the type of product graph and allows learning an adhoc structure for forecasting purposes. Numerical results for position prediction of a D point cloud corroborate the proposed method and its potential to improve accuracy compared with the singlefeature graph models [6].2 Graph VAR model
Consider an dimensional time series , in which each entry is a timevarying signal for a quantity of interest; can be a temperature recording by a sensor at time , while collects the recordings of such sensors. We can model the temporal evolution of through the vector autoregressive model:
(1) 
which expresses the current value as the linear combination of past realizations . The matrices contain the parameters of this model and express the influence of the different entries of into ; vector collects the model error also labeled as the innovation term [4]. Estimating the parameters in (1) is challenging, especially when
is large. As such, parametric models for
are necessary and popular choices include factor models [14] [15] or lowrank data representations [4].When the relations between the different time series in can be captured by a network, we can exploit this structure to reduce the parameters of (1) in an efficient way that does not hurt prediction accuracy. To be more precise, let graph model the network, where is the set of nodes (or vertices), is the set of edges, and is an matrix to represent this graph structure. We refer to the matrix as the graph shift operator matrix [16] —e.g., the weighted adjacency matrix (directed graphs) or the graph Laplacian (undirected graphs)— with nonzero entries only if or . For a fixed , is a graph signal in which entry resides on node [16].
We can process instantly the graph signal over the graph by using socalled graph filters [18]; by setting as the filter input, we get the filtered signal:
(2) 
where the polynomial matrix is the graph filter with scalar coefficients . The expression in (2) builds the instantaneous output entry of at node as a linear combination of terms: the first term is the signal value of node ; the other terms are contributions of signal values from hop neighbors of node . Since respects the sparsity of the graph, the operation aggregates at node the signal values of its onehop neighbors with a complexity , where is the number of edges. Likewise, aggregates at node the signal values of its hop neighbors with a complexity , which is also the complexity of the operation in (2).
The graph filtering operation in (2) represents a powerful way to instantly process by accounting for the underlying structure. Graph filters have been used in [6] to substitute the matrices in (1) with different graph filters for modeling as:
(3) 
In other words, each of the past realizations is treated as a different graph signal, which is further processed with a different graph filter . As such, the process value at node depends now on the process values with a temporal window of and a graph window of , i.e., depends on the past values of the nodes that are within hops from node . The graph VAR (GVAR) model in (3) inherits from the graph filters a number of parameters that is and an implementation complexity of . Both these quantities are orders smaller, and independent on the graph process dimensions, than the respective and of the classical VAR in (1); hence, requiring less training data to estimate the model parameters.
Despite the above benefits, the GVAR model cannot handle cases where on each node we have a vector of feature values rather than a scalar . Treating each feature separately and predicting its evolution with (3) is certainly an option but it will not account for the dependencies between features. To account efficiently for these dependencies and improve the prediction accuracy, we introduce next the notion of multidimensional graph processes and leverage product graphs to develop generalized GVAR models for forecasting the temporal evolution of such processes.
3 Product Graph VAR Model
When on top of each node we have a vector of feature values rather than a scalar, we talk about a multidimensional graph signal. We store the feature values of node in the vector and refer to it as the node signal for node . The overall Fdimensional graph signal at time is the vector , i.e., the vector that concatenates all node signals^{1}^{1}1Note that the dimensional graph signal is denoted with the same symbol , as the onedimensional graph signal in Section 2. We chose this slight abuse of notation to construct a generalized GVAR model that follows a similar recursion as in (3).. Likewise, we can also consider the th feature of all nodes and store them in the vector , which we refer to as the feature signal. With the terminology of Section 2, each feature signal is a graph signal and the dimensional graph signal is a collection of (onedimensional) graph signals. Fig. 0(a) illustrates a threedimensional graph signal.
Our goal now is to forecast the temporal evolution of the dimensional graph process . We want to develop a graph VAR model as in (3) to forecast now the evolution of an vector . A naive way doing this is to ignore the underlying graph structure and build an alternative graph (using topology identification based on a certain metric [20]) of nodes and use the GVAR model (3) to forecast the process. This strategy has two main disadvantages: first, it destroys the known underlying structure present between nodes (e.g., friendships in a social network) and builds a featurebased similarity graph; second, it leads to a shift operator of larger dimensions without any further structure, hence with a larger storage and computational complexity. To tackle both these issues, we use the concept of product graphs to propose a product graph VAR model for forecasting dimensional graph processes.
3.1 Product Graph Representation
We assume that the features of each node signal have a hidden relation which we represent with a feature graph , where each vertex in is one feature, is the edge set connecting different features, and is the graph shift operator matrix of . Observe that all nodes have the same feature graph . A simple way to build is through a feature similarity metric, e.g., Pearson correlation or Gaussian weighting kernels [20]. In other words, we consider each feature graph now residing on the nodes of the underlying graph . Fig. 0(b) illustrates possible feature graphs between the features of each node.
While the graph influences the intraconnections between the features of a node , the graph influences the interconnections between different nodes . In particular, if nodes and share an edge in , this edge should be reflected in the interconnections between the features of the vectors and and, viceversa, if and are not connected in , there should not be interconnectivities between the features of the vectors and . We can formally capture both the intra and interconnectivities between features with product graphs. The product graph of the underlying graph and the feature graph is a new graph:
(4) 
with node set of cardinality , and where the edge set and the graph shift operator are dictated by the type of product graph. Popular product graphs are the Kronecker, Cartesian, and strong product [7]. For the Cartesian product, for instance, the graph shift operator is with () the (
) identity matrix. The edge set cardinality for this graph is
. Fig. 0(c) illustrates the Cartesian product graph for the threedimensional graph signal example.The dimensional graph signal on is now a (onedimensional) graph signal on . Therefore, we can develop a GVAR model as in (3) w.r.t. the product graph shift operator . We will refer to the latter as a product graph VAR model, while we will also develop a more generalized version that goes beyond the direct application of (3) to a product graph. The benefits of this product graph strategy are that the storage complexity of is that of storing and separately and the computational complexity is of the combined sparsity orders of and ; both in general lower than building a graph for the signal .
Product Graph VAR. As discussed above, we thus model the temporal evolution of the dimensional graph process with the product graph VAR (PGVAR) model:
(5) 
Model (5) expresses, through , the influence that the past dimensional graph process realizations have on each node signal at time , i.e., for . The order implies now that the th feature in depends on all other feature realizations for , , and , within a hop neighborhood in the product graph . As such the PGVAR model forecasts future values of by accounting on the intra and interconnectivities between the different node features. Besides storage and computational benefits, model (5) has also a number of parameters that is independent on and ; hence, requiring little training data to estimate them. For being the Cartesian graph, the implementation complexity of (5) is which is comparable with the cost of independent GVAR models (3), since and (as well as and ) are often of the same order. But the PGVAR model captures now additional intrarelations; hence, has the potential to improve the prediction accuracy.
Generalized Product Graph VAR. The performance of the PGVAR heavily depends on the considered type of product graph. However, it is not clear which is the most suited product graph for a task at hand. Here, we bypass this issue by jointly estimating the model parameters and the type of product graph. For an underlying graph and a feature graph , any product graph can be expressed as:
(6) 
where “” denotes the Kronecker product and . A graph filter of order over the product graph has thus the form:
(7) 
where we see that the coefficients that define the type of product graph play a role in the filtering behavior. The filter in (7) can be seen as a special case of the more general product graph filter:
(8) 
which is now defined by the general parameters . The novel aspect of the general filter in (8) is in the way the shift operators and are combined within the filter. We can therefore use (8) to extend (5) towards a general PGVAR model for the dimensional graph process as:
(9) 
At a slight increase in the number of parameters to estimate, i.e., , compared with model (5), the generalized PGVAR model parameterizes now also the type of product graph. As such it exploits adhoc the intra and interconnectivities in the historical data of the dimensional process .
3.2 Parameter estimation
One of the main benefits of models (5) and (9) is that their respective parameters can be readily estimated following the procedure for the GVAR developed in [6]. Let for be a collection of graph filters that can represent either the forms (5) or (9). Given the historical data of the dimensional graph process, the optimal predictor for is given by the conditional expectation:
(10) 
since the innovation term has a zero mean. The mean square error for the optimal predictor (10) is:
(11) 
where is the autocorrelation matrix of process at temporal lag , i.e., . We then find the parameters for model (10) as those that minimize the MSE. For the PGVAR model, this implies solving the optimization problem:
(12)  
subject to 
where the MSE expression is given in (11). Likewise, for the general PGVAR model, the framework is similar, with the only difference in the filter expression and in the optimization variables .
4 Numerical Results
We consider the task of predicting the position of a walking dog mesh [21]. The mesh has spatial points over time instants. For each spatial point, we have the three coordinates that change over time. We treated each point as the nodes of a ten nearest neighbor (NN) graph built from the coordinates in the first time instant. The node coordinates represent the features and our goal is to predict the onestep value. As in [6], we preprocessed the data to render them with zero mean and unitary maximum value. We compared the PGVAR model in (5) with the GVAR model in (3), where for this dataset the latter has yielded a superior performance compared to the standard VAR in (1) and other graphbased techniques [6]. We did not analyze the generaliz PGVAR in (9) since the number of time instants is relatively low for the increased number of parameters; we will analyze this model on bigger data sets and present these observations in the extended version of this work.
Experimental setup. We considered the Cartesian product graph to model the relations between the underlying NN graph and a fully connected feature graph of nodes. The resulting product graph has nodes connecting different coordinates. We followed [6] and split the data across the temporal dimension into insample and outofsample data. We analyzed different percentages of this split ranging from to insample data. The insample data were used to estimate the model parameters and the outofsample data to test the performance. The insample data have been further divided into training and validation sets. We found the filter coefficients by fitting the model with different parameters and in the training set and assessing their performance in the validation set [22]. Then, we refitted the model with the best performing tuple into the whole insample set. We evaluated the performance of the different algorithms with the root normalized MSE (rNMSE) defined as:
(13) 
where is the predicted value at time and is the true one. For the GVAR model in (3), we predicted each coordinate separately and report the average prediction error over all coordinates.
Results. Fig. 1(a) shows the rNMSE of the two methods for different percentages of insample data when considering the prediction of an entire multidimensional graph signal in the test set. Despite the considered scenario has a low number of training data and does not show high correlation between the different features, our method yields at least a comparable accuracy w.r.t. (3). However, when more training samples are available, the PGVAR yields an improved performance. In addition, Fig. 1(b) depicts the true mesh position and the two estimates at a random (test) time instant, where we can observe how the proposed model matches well the real dog position.
5 Conclusion
We proposed a product graphbased model to forecast future values of multidimensional graph processes, i.e., processes which at each node have multiple timevarying features. The proposed approach builds first a feature graph between the different node features and exploits then the underlying structure between nodes to link the different features on the different nodes with product graphs. Subsequently, we incorporated into the VAR models the product graph structure to forecast the multidimensional process with a number of parameters that is independent of the dimensions of the graphs and therefore has a low implementation complexity. Further, we also devised a general forecasting with product graphs, which learns directly from the data also the adequate type of product graph for the task at hand. As future works, we will corroborate our methods also on larger data sets, execute numerical tests on the general PGVAR model and work on its possible extension(s).
References
 [1] I. Ibáñez, J.A. Silander, J.M. Allen, S.A. Treanor, A. Wilson, “Identifying hotspots for plant invasions and forecasting focal points of further spread,” J. Appl. Ecol., vol. 46, pp. 12191228, Dec. 2009.
 [2] P. A. Klein and M. Niemira, “Forecasting financial and economic cycles”, Finance editions. New York: Wiley, 1994.
 [3] G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung, Time series analysis: forecasting and control. John Wiley & Sons, 2015.
 [4] H. Lütkepohl, New introduction to multiple time series analysis. Springer Science & Business Media, 2005.
 [5] J. Mei and J. M. Moura, “Signal processing on graphs: Causal modeling of unstructured data,” IEEE Transactions on Signal Processing, vol. 65, no. 8, pp. 2077–2092, 2017.
 [6] E. Isufi, Andreas Loukas, N. Perraudin, G. Leus, “Forecasting Time Series with VARMA Recursions on Graphs”, to appear in IEEE Transactions on Signal Processing, 2019.
 [7] R. Hammack, W. Imrich, and S. Klavzar, ‘Handbook of product graphs,” Second Edition, CRC Press, Boca Raton, FL,2011.
 [8] M. S. Kotzagiannidis, P. L. Dragotti, “Sampling and Reconstruction of Sparse Signals on Circulant Graphs  An Introduction to GraphFRI, ” Appl. Comput. Harmon. Anal. (2017), in press, http://dx.doi.org/10.1016/j.acha.2017.10.003, available on arXiv: arXiv:1606.08085.

[9]
Guillermo OrtizJiménez, Mario Coutino, Sundeep Prabhakar Chepuri, Geert Leus,
“Sparse Sampling for Inverse Problems With Tensors, ” in
IEEE Transactions on Signal Processing , Volume: 67, Jun. 15, 2019.  [10] Rohan Varma and Jelena Kovacevic, “Sampling Theory for Graph Signals on Product Graphs,” arXiv:1809.10049, 2018.
 [11] M. S. Kotzagiannidis and P. L. Dragotti, “Splines and wavelets on circulant graphs,” in Appl. Comput. Harmon. Anal., vol. 47, pp. 481515, Sep. 2019.
 [12] F. Grassi, A. Loukas, N. Perraudin, and B. Ricaud, “A TimeVertex Signal Processing Framework: Scalable Processing and Meaningful Representations for TimeSeries on Graphs,” in IEEE Transactions on Signal Processing vol. 66, no. 3, pp. 817 – 829, 2018.
 [13] D Romero, VN Ioannidis, GB Giannakis “Kernelbased reconstruction of spacetime functions on dynamic graphs,” in IEEE Journal of Selected Topics in Signal Processing 11 (6), 856869.
 [14] G. Connor, “The three types of factor models: A comparison of their explanatory power,” Financial Analysts Journal, vol. 51, no. 3, pp. 42– 46, 1995.
 [15] C. Lam, Q. Yao, and N. Bathia, “Factor modeling for high dimensional time series,” in in Recent Advances in Functional Data Analysis and Related Topics. Springer, 2011, pp. 203–207.

[16]
D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst,
“The Emerging Field of Signal Processing on Graphs: Extending HighDimensional Data Analysis to Networks and Other Irregular Domains,”
IEEE Signal Processing Magazine,, vol. 30, no. 3, pp. 83–98, 2013.  [17] Fernando Gama, Antonio G. Marques, Alejandro Ribeiro and Geert Leus, “MIMO Graph Filters for Convolutional Neural Networks,” arXiv:1803.02247, 2018.
 [18] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs”, IEEE Trans. Signal Process., vol. 61, no. 7, pp. 1644–1656, Apr. 2013.
 [19] A. Sandryhaila J.M.F. Moura, “Big data analysis with signal processing on graphs”, IEEE Signal Process. Mag. vol. 31 no. 5 pp. 8090 2013.
 [20] G. Mateos, S. Segarra, A. G. Marques and A. Ribeiro, ”Connecting the Dots: Identifying Network Structure via Graph Signal Processing,” in IEEE Signal Processing Magazine, vol. 36, no. 3, pp. 1643, May 2019. doi: 10.1109/MSP.2018.2890143
 [21] J. Gall, C. Stoll, E. De Aguiar, C. Theobalt, B. Rosenhahn, and H. P. Seidel, “Motion capture using joint skeleton tracking and surface estimation,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009, pp. 1746–1753.
 [22] J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning. Springer series in statistics New York, 2001, vol. 1.
Comments
There are no comments yet.