1 Introduction
Machine learning has been applied in diverse areas of physical sciences ranging from condensed matter to high energy physics to complex systems. In complex systems, neural networkbased machine learning techniques have been used in predicting amplitude death
amplitude_death , anticipation of synchronization Anticipating_synchronization, phase transitions in complex networks
ml_phase_transition , time series prediction ghosh2021reservoir , etc. In particular, forecasting time series data has attracted interest from the scientific fraternity due to its diverse applications in realworld dynamical systems like stock markets and the brain. However, predicting the time series of a dynamical system has many limitations ModelFreePrediction ; LongTermPrediction . Since every data point in a time series is a function of the previous time steps, the error in predicting the future time series data points compounds over time. To avoid prediction error, the correlation matrix of the time series is preferred over the direct prediction of the future time series data points. A correlation matrix of a given multivariate time series data set is useful in several practical realworld scenarios schindler2007assessing ; PropertiesOfCrossCorrelations . For instance, by considering fMRI or MEG signals from several brain regions as time series data, one can construct the corresponding correlation matrix, which can then be used to extract the adjacency matrix by setting a threshold value 4 ; 6 ; 7 .In most cases, one calculates average correlation matrices of a given time series data. A correlation matrix of time series data may vary depending on the length of observations and temporal position. Two wellknown methods of estimating a true correlation matrix are; (i) the maximum likelihood estimation (MLE) and (ii) the graphical least absolute shrinkage and selection operator method (GLASSO). The MLE method first assumes a sample correlation matrix from a Gaussian distribution that is iteratively corrected to estimate an actual correlation matrix by maximizing the likelihood of observing the given time series data. The GLASSO method
8 is an extension of the MLE method for those cases where MLE can not be applied, for example, if the dimensionality of the Gaussian distribution is higher than the available number of observation samples. Furthermore, a prerequisite of these techniques is to have information on time series data of all the network nodes, whereas, in realworld systems, often time series information of limited nodes are available. Therefore, these methods stall modelling cases where the number of time series is much lesser than the number of nodes forming the corresponding system.Here we develop a machine learning framework to reconstruct a full correlation matrix from time series data of a few nodes. By considering two different dynamical systems, we demonstrate that the supervised learning method can predict the correlation matrix from the limited time series data of a few nodes. Furthermore, by analyzing the mean square error (MSE) between the true and the predicted correlation matrices, we confirm that only a limited time series data for a subset of nodes is enough for accomplishing good predictions. The correlation matrices are predicted using either by considering higher degree nodes or by considering lower degree nodes of a given network. The prediction accuracy for both cases is observed to be the same, indicating that the degree of the available nodes does not impact the prediction of a correlation matrix. Furthermore, we use an unsupervised learning algorithm (UMAP) to provide insights into our forecasts. Finally, we use realworld EEG data sets to validate our model.
The article is organized as follows: Section 2 discusses the graphs, dynamical, and machine learning models. It also contains the notations and definitions used in the paper. Section 3 illustrates the time series data generation, results, and analysis. Finally, section 4 summarizes our study.
2 Preliminary
Consider an undirected graph or network, where is the set of vertices (nodes) and is the set of edges (connections) which contains the unordered pairs of vertices. We denote the adjacency matrix corresponding to as which is defined as if nodes and are connected, and otherwise. The and represent the number of nodes and number of edges in , respectively. The number of edges linked to a particular node is referred to as its degree and denoted by . The average degree of the network is denoted by = . We use two random graph models, the ErdősRényi (ER) and the ScaleFree (SF) networks, to model the coupled dynamical systems barabasi1999emergence
. The ER random model network or the graphvalued random variable with the parameters is denoted by
where is the number of nodes andis the edge probability
blum2020foundations . The existence of each edge is statistically independent of all other edges. When we refer to ‘the graph ’, we mean one realization of the random variable with mean degree and generated as follows. Starting with number of nodes, connecting them with a probability . The ER random network realization thus generated will have a Binomial degree distribution. The SF networks () generated using the BarabásiAlbert model follows a powerlaw degree distribution barabasi1999emergence .2.1 Dynamical Models
We consider chaotic Rössler and FitzHughNagumo neuronal oscillators to model the dynamical evolution of nodes. The coupled dynamics of the nodes on a given graph generate time series data for the entire system. Dynamical evolution of the phase of each node in the network modelled by the Rössler oscillator rosenblum1996phase can be given by:
(1) 
where , , for are the dynamical state variables and is the natural frequency of
node which is drawn from a normal distribution with mean
and variance
. Here, represents a connection between nodes and of , and denotes the overall coupling strength between the connected nodes.Next, we consider the FitzHughNagumo (FHN) model, which is derived from the works of HodgkinHuxley and is famous for its affluence in neuronal dynamics hodgkin1952quantitative . Many variations of the original FitzHughNagumo model have been developed since it was first introduced. In the present study, we consider the FHN model given by the following equations su2014identifying :
(2) 
where indicates the membrane potential and stands for the recovery variable of the node. Frequency () of the driving signal, , is drawn from a normal distribution having mean and variance . Here, denotes the overall coupling strength. We choose other parameters values of the oscillators as , , , and . The important parameter for our purpose is the coupling strengths ( and ) and the network realizations encoded in . We vary , , and to generate time series data having different correlation matrices.
2.2 Machine Learning Algorithms
We use a supervised learning algorithm to predict the correlation matrices and an unsupervised learning algorithm to analyze the results. The machine learning model used to predict the correlation matrix is the feedforward neural network referred to as multilayer perceptron
goodfellow2016deep . The architecture of this model contains one input layer, two hidden layers, and one output layer (Fig. 1). A layer comprises several neurons, and neurons in the adjacent layers are connected. We adopt the SELU (scaled exponential linear unit) as a basic activation function except for the output layer
klambauer2017self . We use sigmoid as the output activation function when the desired output lies between and , and when it is between and . The relationship between the input () and the output () of a layer can be given by,(3) 
where is a weighted connection between the neuron of the layer and neuron of the layer, with denoting the number of neurons in layer. Value of indicates output of neuron in the layer. The neural network receives an input () and generates an output () through the above propagation rule. In our case, is the input time series window (), and is the upper triangular part of the predicted correlation matrix (Fig. 1). Training a neural network means finding a which can give us the desired output (upper triangular part of R) for a given input window () by reducing the difference between the neural network output and the desired output (or upper triangular part of R), where
. We define this difference in the following manner and call it a loss function,
.(4) 
We use Adam (Adaptive Moment Estimation) algorithm to minimize the loss function
kingma2014adam . Furthermore, we use an unsupervised ML approach (UMAP) to gain more insights into the predicted correlation matrices. The UMAP (Uniform Manifold Approximation and Projection) is a manifold learning technique for dimension reduction fujiwara2020visual . The method preserves the similarity between the vectors in the higher dimension and performs a nonlinear projection to the dimensional space. The UMAP preserves the local and global structures of the data set. Data having similar structures or features are clustered together in low dimensions. The use of UMAP here is twofold – (a) provides a visual understanding of the predicted correlation matrices with the true correlation matrices, and (b) helps us to select the appropriate training data set to get a meaningful prediction (SI sec. 4).3 Methods and Results
We predict correlation matrices for cases that closely match realworld scenarios. A dynamic system with a fixed number of nodes can undergo two types of changes (1) coupling strength between nodes can either increase or decrease. (2) a small structural change with rearrangement of links between nodes. We prepare time series data set to match these cases.
3.1 Time series data generation and representation
To prepare data sets, we use two different dynamical models (Rössler and FitzhughNagumo oscillators) with different coupling strengths () on two different model networks (ER and SF) each of having different realizations (). Hence, we have different timesseries data sets () as well as true correlation matrices () for each of the network and dynamical models, respectively. To generate time series data sets, we numerically solve the coupled dynamical systems by varying coupling strength and the realization of model networks. Although on each node of the network, there are three dynamical state variables (, , and ) for the Rössler (Eq. 1) and two dynamical state variables ( and ) for the FitzHughNagumo oscillators (Eq. 2), we consider only the time evolution of and variables of the oscillators for the time series data sets. After removing the transient part of the time evolution of state variables, we consider time series data sets and construct the corresponding true correlation matrices. For a size network, we will have variable time series data, each having length . The time evolution of nodes can be represented as a time series matrix , where each row of the matrix represents individual nodes and time evolution of each node is recorded in columns as
where represents the time series information of the node at the time step for time series data set. The influence of one node on another in a complex system can be represented as the correlation between their time series. We measure the correlation between a pair of time series data of nodes using Pearson correlation coefficient and stored in a matrix () referred to as true correlation matrix, where represents the correlation between the time series of and nodes corresponds to data set pearson1 .
Since our goal is to predict an entire correlation matrix from the partial information, we roll a window of size on and creates time series windows () such that , , , where is the number of windows, and skip is the gap between two consecutive windows in time series data (Fig. 2 (a) and (c)). We use these windows as inputs to the ML model, that is, training and testing data set (Fig. 2 (b) and (d)). Hence, if we vary and , we can create different training and testing data sets. Importantly, during test phase, the ML model predicts a correlation matrix () from a given window (). The time series window matrix and the predicted correlation matrix can be represented as
with elements where , and . To make the notation simpler, we drop and from and . For our experiment we consider , or and .
Note that coupling strengths chosen to create time series data sets lie in the semisynchronous region, i.e., a time series chosen is neither too much correlated nor uncorrelated 14 . If the coupling strength used is such that the time series are too correlated, then the prediction becomes trivial, and if the time series are uncorrelated, then the predictions are of no value. Also, we choose the and values from the semisynchronous region such that correlation values for the time series data to be constant (Fig. S2). More details of choosing the coupling strength values ( and ) are illustrated in SI sec . Further note that a linear correlation coefficient is used in measuring the correlation of the time series of the nodes, which is nonlinear. Since there are many different correlation measures, choosing an appropriate measure is essential. The previous research found the Pearson correlation coefficient to be the most relevant and the correlation that provides the system’s most information tirabassi2015inferring . However, we also use nonlinear correlation measures (e.g., Spearman correlation wissler1905spearman ) for the realworld data sets, and the prediction results are the same.
3.2 Prediction of correlation matrix for varying coupling strengths and network realizations
Models  Rössler ()  FitzHughNagumo () 

For our experiment, we choose different realizations of the ER random networks and different coupling strength for Rössler oscillators (Table 1). Hence, we have (training set: and test set: ) different time series data sets and corresponding true correlation matrices for the Rössler oscillators on ER network realizations. We also choose different coupling strength for the FitzhughNagumo oscillators on ER network realizations leading to another (training set: and test set: ) time series data sets. Similarly, we choose different realizations of the SF random networks and different coupling strength for the Rössler oscillators and different coupling strength for the FitzhughNagumo oscillators. Thus, we have another two different time series data sets on SF network realizations. Finally, we create time series windows for each of the data sets. For example, from training data sets we generate windows () and from test data sets we generate windows.
In the training phase of the supervised ML algorithm, the input to the model is the set of time series windows and the upper triangular part of the true correlation matrices created from the training data sets (Fig. 2(a, b)).
The input time window matrix () and the true correlation matrix () of the whole system are prepared in the shape of a column vector with information of each node, stacked on top of each other to form the vector (Fig. 2(b)). Thus, we assign neurons for the input layer and for the output layer. Using the loss function (Eq. 4), ML model will update the values (Eq. 3) which will minimize the .
During the testing phase, the ML model will only take a time series window and can predict the correlation matrix (Fig. 2(c, d)). Hence, the supervised learning method takes the inputs of the time series of a few nodes as a window () and predicts the correlation matrix (). The entire correlation matrix is constructed using the symmetry of the matrix with diagonal elements equated to since they are selfcorrelated. Further, to evaluate the performance during test time, we compare the predicted correlation matrices () from the ML algorithm with the true correlation matrix () using Mean Square Error (MSE) measure. Figure 3 shows the MSE between true () and predicted () correlation matrices associated to the test data sets. The observations show that prediction accuracy reaches saturation after increasing the number of nodes beyond a certain point. The saturation in accuracy shows that only a limited time series subset is enough to make good correlation matrix predictions. The supervised algorithm predicts the correlation matrices using both higher and lower degree nodes of the network. The prediction accuracy for both cases is observed to be the same (Fig. 3). Thus, the degree of nodes used does not impact the prediction of the correlation matrices. It might be a reason that higher and lower degree nodes are similar due to the smallworld effect. As the minimum degree of the networks is greater than and the network is connected, a path exists between a pair of nodes. That is, if we wait for a sufficient time, the information of the entire network can be delivered to the node of the minimum degree. We removed the transient region from the time series; as a result, it is thought that correlated information of the entire network is accumulated in both high and lowdegree nodes. For the robustness of our prediction, we vary and and create different training and testing data sets, and one can observe that results are the same. Furthermore, we use an unsupervised learning algorithm to show deeper insights into our prediction.
3.3 Unsupervised learning method to understand correlation matrix prediction
Here, we use the UMAP to understand the similarities between predicted () and true correlation (R) matrices by considering whole matrices as points in high dimensional space and embedding them in 2D space. The input to the UMAP is the upper triangular part of all flattened correlation matrices (predicted and true) corresponding to the test data sets. For instance, consider test set contains timesseries data sets () and corresponding true correlations matrices () obtain from different ER network realizations on Rössler oscillators. Hence, each creates windows where and .
Therefore, the total number of predicted correlation matrices from test data sets equals . For the UMAP algorithm, we take the upper triangular part of a correlation matrix and form a highdimensional vector. The UMAP algorithm takes a high dimensional input vector () and projects it to a point in the lower dimensional plane (). The process is repeated for all the predicted and true correlation matrices associated with the test data sets.
From Fig. 4(a), one can observe that different clusters correspond to different test time series data sets, each having different colors. Further, in 2D space, true correlation matrices are marked with white colored circles along with the cluster color and predicted correlation matrices form a cloud around the true correlation matrix. Importantly, the predicted and true correlation matrices are close in the 2D space. The predicted correlation matrices for a specific are distributed only near the corresponding true correlation matrix, inferring that the predictions made are meaningful. The SF network realizations on Rössler oscillator also shows similar behavior as the ER random network realizations (Fig. 4 (a) and (b)).
Moreover, one can notice that clusters from different true correlation matrices overlap significantly for the FHN oscillators on the ER and the SF network realizations. However, we can still observe that the predicted correlation matrices are located near the true correlation matrices. The fact that true correlation matrices constitute a cluster means that they share similar characteristics, and the correlation matrices predicted from the model also have these features. Therefore, we can assert that the model makes meaningful predictions (Fig. 4 (c) and (d)).
3.4 Experiment on EEG Data
To validate our model, we use the braincomputer interface data set in our study liu2020beta . The database comprises channel Electroencephalogram (EEG) data of
subjects performing a 40target cuedspelling task. The EEG data are stored as a 4way tensor, with a dimension of channel
time point block condition. Our experiment considers time point vs. channel data for subjects of the first block and condition one. Hence, we have different time series data sets () each of having time series and length, . From the time series data sets, we create the corresponding true correlation matrices and denoted as ().Among the EEG data sets, we use as training data and data sets used as test data. The training and test sets division are the same as predicting unknown realization in the Rössler and FHN experiments. Further, we create size windows for the training and test data sets. For instance, for any we have windows (, , ) where , and . Hence, the number of windows for training data sets is and for the test data sets. We train the model in the training phase by using windows and true correlation matrices. During the test phase, we use a window in test data sets to predict the correlation matrix (). We vary and to create other training and test data sets and perform the experiment. As shown in Fig. 5 (a) and (b), the MSE decreases as increases. But for , it does not affect the performance much until . The MSE error converges around . If we look at the UMAP for and R, we can see that is distributed near the true R (Fig. 5(c)). In other words, it can be seen that the ML model predicts , reflecting each subject’s unique characteristics. If not, there would be a large overlap in the distribution of R. Although is distributed near the R, distinct from another R, it is distributed over a very wide area. It suggests that the model predicts the coarse correlation but does not reflect the detailed information due to a lack of training data size and noise in the EEG data set. With the experimental results obtained from the model and EEG data sets, we expect that the framework can be applied to other realworld data sets as long as enough data sets exist.
4 Conclusion
We have developed an ML framework that predicts the correlation matrix of the entire system from limited time series data available for a subset of nodes. The motivation behind this framework is two folds. The growing interest in exploiting machine learning techniques in predicting time series of networks has led to various outcomes. However, the problem of using only the limited time series data from a few nodes remained challenging. Our framework provides a way to address and resolve this problem.
From the perspective of the progress of the methodology, we present a framework that is a combined effort of supervised and unsupervised learning integrated with appropriate training and testing data sets. The study is focused on making good predictions of the correlation matrices using the limited time series data of a few available nodes. The framework’s application is undemanding, given that the data should suffice the condition of being generated from the respective oscillators to be in a semisynchronous region, which is taken care of by choosing the appropriate coupling strength.
Supervised learning has been applied to make predictions. Its accuracy has been identified by the Mean Square Error, which is the difference between the true and the predicted correlation matrix. The threshold of the number of nodes required and the length of the limited time series to make good predictions have also been discussed. Unsupervised learning (UMAP) brought more insights into the prediction results by visualizing the results. Finally, we use realworld EEG data sets to validate our model. Our work may have opened up a new window into an area of machine learning in complex networks where predictions can be made possible using only a finite time series length and a finite number of nodes of the network and may be relevant to some realworld applications.
Acknowledgment
NE and PL are thankful to Chittaranjan Hens (IIIT Hyderabad) for the useful discussion on the FHN model. PP is indebted to Kritiprassna Das (IIT Indore) for a detailed discussion on the EEG datasets. SJ acknowledges DST grant SPF/2021/000136.
References
 (1) R. Xiao, L. W. Kong, Z. K. Sun, Y. C. Lai, Predicting amplitude death with machine learning, Phys. Rev. E 104 (2021) 014205.
 (2) H. Fan, L. W. Kong, Y. C. Lai, X. Wang, Anticipating synchronization with machine learning, Phys. Rev. Research 3 (2021) 023237.
 (3) Q. Ni, M. Tang, Y. Liu, Y.C. Lai, Machine learning dynamical phase transitions in complex networks, Phys. Rev. E 100 (2019) 052312.
 (4) S. Ghosh, A. Senapati, A. Mishra, J. Chattopadhyay, S. K. Dana, C. Hens, D. Ghosh, Reservoir computing on epidemic spreading: A case study on covid19 cases, Physical Review E 104 (1) (2021) 014308.
 (5) J. Pathak, B. Hunt, M. Girvan, Z. Lu, E. Ott, Modelfree prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach, Phys. Rev. Lett. 120 (2018) 024102.
 (6) H. Fan, J. Jiang, C. Zhang, X. Wang, Y.C. Lai, Longterm prediction of chaotic systems with machine learning, Phys. Rev. Research 2 (2020) 012080.
 (7) K. Schindler, H. Leung, C. E. Elger, K. Lehnertz, Assessing seizure dynamics by analysing the correlation structure of multichannel intracranial eeg, Brain 130 (1) (2007) 6577.
 (8) V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. Nunes Amaral, H. E. Stanley, Universal and nonuniversal properties of cross correlations in financial time series, Phys. Rev. Lett. 83 (1999) 1471–1474.
 (9) S. Bialonski, M. T. Horstmann, K. Lehnertz, From brain to earth and climate systems: Smallworld interaction networks or not?, Chaos: An Interdisciplinary Journal of Nonlinear Science 20 (1) (2010) 013134.
 (10) V. M. Eguiluz, D. R. Chialvo, G. A. Cecchi, M. Baliki, A. V. Apkarian, Scalefree brain functional networks, Physical review letters 94 (1) (2005) 018102.
 (11) J. Schiefer, A. Niederbuhl, V. Pernice, C. Lennartz, J. Hennig, P. LeVan, S. Rotter, From correlation to causation: Estimating effective connectivity from zerolag covariances of brain signals, PLoS computational biology 14 (3) (2018) e1006056.
 (12) J. Friedman, T. Hastie, R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics 9 (3) (2008) 432441.
 (13) A. L. Barabasi, R. Albert, Emergence of scaling in random networks, science 286 (5439) (1999) 509–512.

(14)
A. Blum, J. Hopcroft, R. Kannan, Foundations of data science, Cambridge University Press, 2020.
 (15) M. G. Rosenblum, A. S. Pikovsky, J. Kurths, Phase synchronization of chaotic oscillators, Physical review letters 76 (11) (1996) 1804.
 (16) A. L. Hodgkin, A. F. Huxley, A quantitative description of membrane current and its application to conduction and excitation in nerve, The Journal of physiology 117 (4) (1952) 500.
 (17) R.Q. Su, Y.C. Lai, X. Wang, Identifying chaotic fitzhughnagumo neurons using compressive sensing, Entropy 16 (7) (2014) 38893902.

(18)
I. Goodfellow, Y. Bengio, A. Courville, Deep learning, MIT press, 2016.
 (19) G. Klambauer, T. Unterthiner, A. Mayr, S. Hochreiter, Selfnormalizing neural networks, in:Proceedings of the 31st international conference on neural information processing systems, 2017, pp. 972981.
 (20) D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014).
 (21) T. Fujiwara, N. Sakamoto, J. Nonaka, K. Yamamoto, K.L. Ma, et al., A visual analytics framework for reviewing multivariate timeseries data with dimensionality reduction, IEEE transactions on visualization and computer graphics 27 (2) (2020) 16011611.
 (22) D. Freedman, R. Pisani, R. Purves, Statistics (international student edition), Pisani, R. Purves, 4th edn. WW Norton and Company, New York (2007).
 (23) O. I. Moskalenko, A. A. Koronovskii, A. E. Hramov, S. Boccaletti, Generalized synchronization in mutually coupled oscillators and complex networks, Physical Review E 86 (3) (2012) 036216.
 (24) G. Tirabassi, R. SevillaEscoboza, J. M. Buldu, C. Masoller, Inferring the connectivity of coupled oscillators from timeseries statistical similarity analysis, Scientific reports 5 (1) (2015) 1–14.
 (25) C. Wissler, The spearman correlation formula, Science 22 (558) (1905) 309311.
 (26) B. Liu, X. Huang, Y. Wang, X. Chen, X. Gao, Beta: A large benchmark database towards svepbci application, Frontiers in neuroscience 14 (2020) 627.