Log In Sign Up

Estimation of Correlation Matrices from Limited time series Data using Machine Learning

by   Nikhil Easaw, et al.
Indian Institute of Technology, Indore

Prediction of correlation matrices from given time series data has several applications for a range of problems, such as inferring neuronal connections from spiking data, deducing causal dependencies between genes from expression data, and discovering long spatial range influences in climate variations. Traditional methods of predicting correlation matrices utilize time series data of all the nodes of the underlying networks. Here, we use a supervised machine learning technique to predict the correlation matrix of entire systems from finite time series information of a few randomly selected nodes. The accuracy of the prediction from the model confirms that only a limited time series of a subset of the entire system is enough to make good correlation matrix predictions. Furthermore, using an unsupervised learning algorithm, we provide insights into the success of the predictions from our model. Finally, we apply the machine learning model developed here to real-world data sets.


page 6

page 9

page 10


Clustering Time Series Data through Autoencoder-based Deep Learning Models

Machine learning and in particular deep learning algorithms are the emer...

Can we Estimate Truck Accident Risk from Telemetric Data using Machine Learning?

Road accidents have a high societal cost that could be reduced through i...

An Alternative Data-Driven Prediction Approach Based on Real Option Theories

This paper presents a new prediction model for time series data by integ...

Discovering long term dependencies in noisy time series data using deep learning

Time series modelling is essential for solving tasks such as predictive ...

A Review of Network Inference Techniques for Neural Activation Time Series

Studying neural connectivity is considered one of the most promising and...

Beyond Predictions in Neural ODEs: Identification and Interventions

Spurred by tremendous success in pattern matching and prediction tasks, ...

Reactmine: a search algorithm for inferring chemical reaction networks from time series data

Inferring chemical reaction networks (CRN) from time series data is a ch...

1 Introduction

Machine learning has been applied in diverse areas of physical sciences ranging from condensed matter to high energy physics to complex systems. In complex systems, neural network-based machine learning techniques have been used in predicting amplitude death

amplitude_death , anticipation of synchronization Anticipating_synchronization

, phase transitions in complex networks

ml_phase_transition , time series prediction ghosh2021reservoir , etc. In particular, forecasting time series data has attracted interest from the scientific fraternity due to its diverse applications in real-world dynamical systems like stock markets and the brain. However, predicting the time series of a dynamical system has many limitations ModelFreePrediction ; LongTermPrediction . Since every data point in a time series is a function of the previous time steps, the error in predicting the future time series data points compounds over time. To avoid prediction error, the correlation matrix of the time series is preferred over the direct prediction of the future time series data points. A correlation matrix of a given multivariate time series data set is useful in several practical real-world scenarios schindler2007assessing ; PropertiesOfCrossCorrelations . For instance, by considering fMRI or MEG signals from several brain regions as time series data, one can construct the corresponding correlation matrix, which can then be used to extract the adjacency matrix by setting a threshold value 4 ; 6 ; 7 .

In most cases, one calculates average correlation matrices of a given time series data. A correlation matrix of time series data may vary depending on the length of observations and temporal position. Two well-known methods of estimating a true correlation matrix are; (i) the maximum likelihood estimation (MLE) and (ii) the graphical least absolute shrinkage and selection operator method (GLASSO). The MLE method first assumes a sample correlation matrix from a Gaussian distribution that is iteratively corrected to estimate an actual correlation matrix by maximizing the likelihood of observing the given time series data. The GLASSO method

8 is an extension of the MLE method for those cases where MLE can not be applied, for example, if the dimensionality of the Gaussian distribution is higher than the available number of observation samples. Furthermore, a prerequisite of these techniques is to have information on time series data of all the network nodes, whereas, in real-world systems, often time series information of limited nodes are available. Therefore, these methods stall modelling cases where the number of time series is much lesser than the number of nodes forming the corresponding system.

Here we develop a machine learning framework to reconstruct a full correlation matrix from time series data of a few nodes. By considering two different dynamical systems, we demonstrate that the supervised learning method can predict the correlation matrix from the limited time series data of a few nodes. Furthermore, by analyzing the mean square error (MSE) between the true and the predicted correlation matrices, we confirm that only a limited time series data for a subset of nodes is enough for accomplishing good predictions. The correlation matrices are predicted using either by considering higher degree nodes or by considering lower degree nodes of a given network. The prediction accuracy for both cases is observed to be the same, indicating that the degree of the available nodes does not impact the prediction of a correlation matrix. Furthermore, we use an unsupervised learning algorithm (UMAP) to provide insights into our forecasts. Finally, we use real-world EEG data sets to validate our model.

The article is organized as follows: Section 2 discusses the graphs, dynamical, and machine learning models. It also contains the notations and definitions used in the paper. Section 3 illustrates the time series data generation, results, and analysis. Finally, section 4 summarizes our study.

2 Preliminary

Consider an un-directed graph or network, where is the set of vertices (nodes) and is the set of edges (connections) which contains the unordered pairs of vertices. We denote the adjacency matrix corresponding to as which is defined as if nodes and are connected, and otherwise. The and represent the number of nodes and number of edges in , respectively. The number of edges linked to a particular node is referred to as its degree and denoted by . The average degree of the network is denoted by = . We use two random graph models, the Erdős-Rényi (ER) and the Scale-Free (SF) networks, to model the coupled dynamical systems barabasi1999emergence

. The ER random model network or the graph-valued random variable with the parameters is denoted by

where is the number of nodes and

is the edge probability

blum2020foundations . The existence of each edge is statistically independent of all other edges. When we refer to ‘the graph ’, we mean one realization of the random variable with mean degree and generated as follows. Starting with number of nodes, connecting them with a probability . The ER random network realization thus generated will have a Binomial degree distribution. The SF networks () generated using the Barabási-Albert model follows a power-law degree distribution barabasi1999emergence .

Figure 1: Schematic representation of ML model structure. Input to the model is partial time series from a few nodes () in the shape of a time series window (

). Time windows are flattened to a column vector (

) and fed into the input layer. The entire correlation matrix is constructed from the output column vector () of the model using the symmetry of the upper triangular part of a correlation matrix.

2.1 Dynamical Models

We consider chaotic Rössler and FitzHugh-Nagumo neuronal oscillators to model the dynamical evolution of nodes. The coupled dynamics of the nodes on a given graph generate time series data for the entire system. Dynamical evolution of the phase of each node in the network modelled by the Rössler oscillator rosenblum1996phase can be given by:


where , , for are the dynamical state variables and is the natural frequency of

node which is drawn from a normal distribution with mean

and variance

. Here, represents a connection between nodes and of , and denotes the overall coupling strength between the connected nodes.

Next, we consider the FitzHugh-Nagumo (FHN) model, which is derived from the works of Hodgkin-Huxley and is famous for its affluence in neuronal dynamics hodgkin1952quantitative . Many variations of the original FitzHugh-Nagumo model have been developed since it was first introduced. In the present study, we consider the FHN model given by the following equations su2014identifying :


where indicates the membrane potential and stands for the recovery variable of the node. Frequency () of the driving signal, , is drawn from a normal distribution having mean and variance . Here, denotes the overall coupling strength. We choose other parameters values of the oscillators as , , , and . The important parameter for our purpose is the coupling strengths ( and ) and the network realizations encoded in . We vary , , and to generate time series data having different correlation matrices.

2.2 Machine Learning Algorithms

We use a supervised learning algorithm to predict the correlation matrices and an unsupervised learning algorithm to analyze the results. The machine learning model used to predict the correlation matrix is the feed-forward neural network referred to as multi-layer perceptron

goodfellow2016deep . The architecture of this model contains one input layer, two hidden layers, and one output layer (Fig. 1

). A layer comprises several neurons, and neurons in the adjacent layers are connected. We adopt the SELU (scaled exponential linear unit) as a basic activation function except for the output layer

klambauer2017self . We use sigmoid as the output activation function when the desired output lies between and , and when it is between and . The relationship between the input () and the output () of a layer can be given by,


where is a weighted connection between the neuron of the layer and neuron of the layer, with denoting the number of neurons in layer. Value of indicates output of neuron in the layer. The neural network receives an input () and generates an output () through the above propagation rule. In our case, is the input time series window (), and is the upper triangular part of the predicted correlation matrix (Fig.  1). Training a neural network means finding a which can give us the desired output (upper triangular part of R) for a given input window () by reducing the difference between the neural network output and the desired output (or upper triangular part of R), where

. We define this difference in the following manner and call it a loss function,



We use Adam (Adaptive Moment Estimation) algorithm to minimize the loss function

kingma2014adam . Furthermore, we use an unsupervised ML approach (UMAP) to gain more insights into the predicted correlation matrices. The UMAP (Uniform Manifold Approximation and Projection) is a manifold learning technique for dimension reduction fujiwara2020visual . The method preserves the similarity between the vectors in the higher dimension and performs a nonlinear projection to the dimensional space. The UMAP preserves the local and global structures of the data set. Data having similar structures or features are clustered together in low dimensions. The use of UMAP here is two-fold – (a) provides a visual understanding of the predicted correlation matrices with the true correlation matrices, and (b) helps us to select the appropriate training data set to get a meaningful prediction (SI sec. 4).

3 Methods and Results

We predict correlation matrices for cases that closely match real-world scenarios. A dynamic system with a fixed number of nodes can undergo two types of changes (1) coupling strength between nodes can either increase or decrease. (2) a small structural change with rearrangement of links between nodes. We prepare time series data set to match these cases.

3.1 Time series data generation and representation

To prepare data sets, we use two different dynamical models (Rössler and Fitzhugh-Nagumo oscillators) with different coupling strengths () on two different model networks (ER and SF) each of having different realizations (). Hence, we have different times-series data sets () as well as true correlation matrices () for each of the network and dynamical models, respectively. To generate time series data sets, we numerically solve the coupled dynamical systems by varying coupling strength and the realization of model networks. Although on each node of the network, there are three dynamical state variables (, , and ) for the Rössler (Eq. 1) and two dynamical state variables ( and ) for the FitzHugh-Nagumo oscillators (Eq. 2), we consider only the time evolution of and variables of the oscillators for the time series data sets. After removing the transient part of the time evolution of state variables, we consider time series data sets and construct the corresponding true correlation matrices. For a size network, we will have variable time series data, each having length . The time evolution of nodes can be represented as a time series matrix , where each row of the matrix represents individual nodes and time evolution of each node is recorded in columns as

where represents the time series information of the node at the time step for time series data set. The influence of one node on another in a complex system can be represented as the correlation between their time series. We measure the correlation between a pair of time series data of nodes using Pearson correlation coefficient and stored in a matrix () referred to as true correlation matrix, where represents the correlation between the time series of and nodes corresponds to data set pearson1 .

Figure 2: Illustrate the training and testing phase of the machine learning model. For each , (, ), we have corresponding time series data set and true correlation matrix (, , ). (a) To use the ML model, we generate time windows of size () from the time series data set and assign the corresponding true correlation matrix as the desired output. (b) The time windows and correlation matrices are flattened to train the ML model. For the correlation matrix, only the upper triangular part is considered. The correlation matrix predicted from a specific time window is used for updating the model parameters by calculating the MSE between the model output (predicted correlation matrix) and desired output (true correlation matrix). (c, d) During the test phase, for a given window as input, the ML model can predict the upper triangular part of the correlation matrix () as output. The model performance is evaluated by generating time windows from time series not used for training and predicting correlation matrices. All parameters are updated during the training phase and are fixed during the test phase.

Since our goal is to predict an entire correlation matrix from the partial information, we roll a window of size on and creates time series windows () such that , , , where is the number of windows, and skip is the gap between two consecutive windows in time series data (Fig. 2 (a) and (c)). We use these windows as inputs to the ML model, that is, training and testing data set (Fig. 2 (b) and (d)). Hence, if we vary and , we can create different training and testing data sets. Importantly, during test phase, the ML model predicts a correlation matrix () from a given window (). The time series window matrix and the predicted correlation matrix can be represented as

with elements where , and . To make the notation simpler, we drop and from and . For our experiment we consider , or and .

Note that coupling strengths chosen to create time series data sets lie in the semi-synchronous region, i.e., a time series chosen is neither too much correlated nor uncorrelated 14 . If the coupling strength used is such that the time series are too correlated, then the prediction becomes trivial, and if the time series are uncorrelated, then the predictions are of no value. Also, we choose the and values from the semi-synchronous region such that correlation values for the time series data to be constant (Fig. S2). More details of choosing the coupling strength values ( and ) are illustrated in SI sec . Further note that a linear correlation coefficient is used in measuring the correlation of the time series of the nodes, which is nonlinear. Since there are many different correlation measures, choosing an appropriate measure is essential. The previous research found the Pearson correlation coefficient to be the most relevant and the correlation that provides the system’s most information tirabassi2015inferring . However, we also use nonlinear correlation measures (e.g., Spearman correlation wissler1905spearman ) for the real-world data sets, and the prediction results are the same.

3.2 Prediction of correlation matrix for varying coupling strengths and network realizations

Models Rössler () FitzHugh-Nagumo ()
Table 1: Different network realizations and chosen coupling strength values for time series data generation.

For our experiment, we choose different realizations of the ER random networks and different coupling strength for Rössler oscillators (Table 1). Hence, we have (training set: and test set: ) different time series data sets and corresponding true correlation matrices for the Rössler oscillators on ER network realizations. We also choose different coupling strength for the Fitzhugh-Nagumo oscillators on ER network realizations leading to another (training set: and test set: ) time series data sets. Similarly, we choose different realizations of the SF random networks and different coupling strength for the Rössler oscillators and different coupling strength for the Fitzhugh-Nagumo oscillators. Thus, we have another two different time series data sets on SF network realizations. Finally, we create time series windows for each of the data sets. For example, from training data sets we generate windows () and from test data sets we generate windows.

Figure 3: Average Mean Square Error (MSE) between true (R) and predicted () correlation matrices associated with the test data sets for the varying number of nodes () and window size. (a, c) the number of higher degree nodes (ER-HD, SF-HD) vs. MSE for ER and SF networks on both the oscillators with varying window size and (b, d) number of lower degree nodes (ER-LD, SF-LD) vs. MSE for ER and SF networks on both the oscillators with varying window size. The plots show saturation in MSE with an increase in , asserting that only limited nodes are required for predicting the entire correlation matrix for both models. For the oscillators, we choose , for ER network realizations and for the SF network realizations, we consider , and windows sizes .

In the training phase of the supervised ML algorithm, the input to the model is the set of time series windows and the upper triangular part of the true correlation matrices created from the training data sets (Fig. 2(a, b)).

Figure 4: UMAP is generated from the test data sets’ predicted and true correlation matrices. The labeled color indicates different test data sets () while white circles represent true correlation matrices (). The color dots indicate predicted correlation matrices obtained from different time windows. UMAP for test data sets generated from (a, b) Rössler oscillator on different realizations for ER and SF networks having . (c, d) FHN oscillator on different realizations for ER and SF networks having . We consider , thus .

The input time window matrix () and the true correlation matrix () of the whole system are prepared in the shape of a column vector with information of each node, stacked on top of each other to form the vector (Fig. 2(b)). Thus, we assign neurons for the input layer and for the output layer. Using the loss function (Eq. 4), ML model will update the values (Eq. 3) which will minimize the .

During the testing phase, the ML model will only take a time series window and can predict the correlation matrix (Fig. 2(c, d)). Hence, the supervised learning method takes the inputs of the time series of a few nodes as a window () and predicts the correlation matrix (). The entire correlation matrix is constructed using the symmetry of the matrix with diagonal elements equated to since they are self-correlated. Further, to evaluate the performance during test time, we compare the predicted correlation matrices () from the ML algorithm with the true correlation matrix () using Mean Square Error (MSE) measure. Figure 3 shows the MSE between true () and predicted () correlation matrices associated to the test data sets. The observations show that prediction accuracy reaches saturation after increasing the number of nodes beyond a certain point. The saturation in accuracy shows that only a limited time series subset is enough to make good correlation matrix predictions. The supervised algorithm predicts the correlation matrices using both higher and lower degree nodes of the network. The prediction accuracy for both cases is observed to be the same (Fig. 3). Thus, the degree of nodes used does not impact the prediction of the correlation matrices. It might be a reason that higher and lower degree nodes are similar due to the small-world effect. As the minimum degree of the networks is greater than and the network is connected, a path exists between a pair of nodes. That is, if we wait for a sufficient time, the information of the entire network can be delivered to the node of the minimum degree. We removed the transient region from the time series; as a result, it is thought that correlated information of the entire network is accumulated in both high and low-degree nodes. For the robustness of our prediction, we vary and and create different training and testing data sets, and one can observe that results are the same. Furthermore, we use an unsupervised learning algorithm to show deeper insights into our prediction.

3.3 Unsupervised learning method to understand correlation matrix prediction

Here, we use the UMAP to understand the similarities between predicted () and true correlation (R) matrices by considering whole matrices as points in high dimensional space and embedding them in 2D space. The input to the UMAP is the upper triangular part of all flattened correlation matrices (predicted and true) corresponding to the test data sets. For instance, consider test set contains times-series data sets () and corresponding true correlations matrices () obtain from different ER network realizations on Rössler oscillators. Hence, each creates windows where and .

Figure 5: Average Mean Square Error (MSE) between true (R) and predicted () correlation matrices for EEG test data sets. (a) performance array contains MSE of test data sets for varying window size () and the number of channels () and (b) vs. MSE with varying window size. The plots show decreasing MSE with an increase in , asserting that only a limited channel’s EEG time series data are required to predict the entire correlation matrix for EEG data sets. (c) UMAP is generated from the EEG test data sets’ R and . Among the EEG data of 65 subjects, two subjects were used as the test data set. The labeled color indicates two different subjects while white circles represent true correlation matrices R. The color dots embedded points from s corresponding to different windows. s were generated from .

Therefore, the total number of predicted correlation matrices from test data sets equals . For the UMAP algorithm, we take the upper triangular part of a correlation matrix and form a high-dimensional vector. The UMAP algorithm takes a high dimensional input vector () and projects it to a point in the lower dimensional plane (). The process is repeated for all the predicted and true correlation matrices associated with the test data sets.

From Fig. 4(a), one can observe that different clusters correspond to different test time series data sets, each having different colors. Further, in 2D space, true correlation matrices are marked with white colored circles along with the cluster color and predicted correlation matrices form a cloud around the true correlation matrix. Importantly, the predicted and true correlation matrices are close in the 2D space. The predicted correlation matrices for a specific are distributed only near the corresponding true correlation matrix, inferring that the predictions made are meaningful. The SF network realizations on Rössler oscillator also shows similar behavior as the ER random network realizations (Fig. 4 (a) and (b)).

Moreover, one can notice that clusters from different true correlation matrices overlap significantly for the FHN oscillators on the ER and the SF network realizations. However, we can still observe that the predicted correlation matrices are located near the true correlation matrices. The fact that true correlation matrices constitute a cluster means that they share similar characteristics, and the correlation matrices predicted from the model also have these features. Therefore, we can assert that the model makes meaningful predictions (Fig. 4 (c) and (d)).

3.4 Experiment on EEG Data

To validate our model, we use the brain-computer interface data set in our study liu2020beta . The database comprises -channel Electroencephalogram (EEG) data of

subjects performing a 40-target cued-spelling task. The EEG data are stored as a 4-way tensor, with a dimension of channel

time point block condition. Our experiment considers time point vs. channel data for subjects of the first block and condition one. Hence, we have different time series data sets () each of having time series and length, . From the time series data sets, we create the corresponding true correlation matrices and denoted as ().

Among the EEG data sets, we use as training data and data sets used as test data. The training and test sets division are the same as predicting unknown realization in the Rössler and FHN experiments. Further, we create size windows for the training and test data sets. For instance, for any we have windows (, , ) where , and . Hence, the number of windows for training data sets is and for the test data sets. We train the model in the training phase by using windows and true correlation matrices. During the test phase, we use a window in test data sets to predict the correlation matrix (). We vary and to create other training and test data sets and perform the experiment. As shown in Fig. 5 (a) and (b), the MSE decreases as increases. But for , it does not affect the performance much until . The MSE error converges around . If we look at the UMAP for and R, we can see that is distributed near the true R (Fig. 5(c)). In other words, it can be seen that the ML model predicts , reflecting each subject’s unique characteristics. If not, there would be a large overlap in the distribution of R. Although is distributed near the R, distinct from another R, it is distributed over a very wide area. It suggests that the model predicts the coarse correlation but does not reflect the detailed information due to a lack of training data size and noise in the EEG data set. With the experimental results obtained from the model and EEG data sets, we expect that the framework can be applied to other real-world data sets as long as enough data sets exist.

4 Conclusion

We have developed an ML framework that predicts the correlation matrix of the entire system from limited time series data available for a subset of nodes. The motivation behind this framework is two folds. The growing interest in exploiting machine learning techniques in predicting time series of networks has led to various outcomes. However, the problem of using only the limited time series data from a few nodes remained challenging. Our framework provides a way to address and resolve this problem.

From the perspective of the progress of the methodology, we present a framework that is a combined effort of supervised and unsupervised learning integrated with appropriate training and testing data sets. The study is focused on making good predictions of the correlation matrices using the limited time series data of a few available nodes. The framework’s application is undemanding, given that the data should suffice the condition of being generated from the respective oscillators to be in a semi-synchronous region, which is taken care of by choosing the appropriate coupling strength.

Supervised learning has been applied to make predictions. Its accuracy has been identified by the Mean Square Error, which is the difference between the true and the predicted correlation matrix. The threshold of the number of nodes required and the length of the limited time series to make good predictions have also been discussed. Unsupervised learning (UMAP) brought more insights into the prediction results by visualizing the results. Finally, we use real-world EEG data sets to validate our model. Our work may have opened up a new window into an area of machine learning in complex networks where predictions can be made possible using only a finite time series length and a finite number of nodes of the network and may be relevant to some real-world applications.


NE and PL are thankful to Chittaranjan Hens (IIIT Hyderabad) for the useful discussion on the FHN model. PP is indebted to Kritiprassna Das (IIT Indore) for a detailed discussion on the EEG datasets. SJ acknowledges DST grant SPF/2021/000136.


  • (1) R. Xiao, L. W. Kong, Z. K. Sun, Y. C. Lai, Predicting amplitude death with machine learning, Phys. Rev. E 104 (2021) 014205.
  • (2) H. Fan, L. W. Kong, Y. C. Lai, X. Wang, Anticipating synchronization with machine learning, Phys. Rev. Research 3 (2021) 023237.
  • (3) Q. Ni, M. Tang, Y. Liu, Y.C. Lai, Machine learning dynamical phase transitions in complex networks, Phys. Rev. E 100 (2019) 052312.
  • (4) S. Ghosh, A. Senapati, A. Mishra, J. Chattopadhyay, S. K. Dana, C. Hens, D. Ghosh, Reservoir computing on epidemic spreading: A case study on covid-19 cases, Physical Review E 104 (1) (2021) 014308.
  • (5) J. Pathak, B. Hunt, M. Girvan, Z. Lu, E. Ott, Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach, Phys. Rev. Lett. 120 (2018) 024102.
  • (6) H. Fan, J. Jiang, C. Zhang, X. Wang, Y.C. Lai, Long-term prediction of chaotic systems with machine learning, Phys. Rev. Research 2 (2020) 012080.
  • (7) K. Schindler, H. Leung, C. E. Elger, K. Lehnertz, Assessing seizure dynamics by analysing the correlation structure of multichannel intracranial eeg, Brain 130 (1) (2007) 65-77.
  • (8) V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. Nunes Amaral, H. E. Stanley, Universal and nonuniversal properties of cross correlations in financial time series, Phys. Rev. Lett. 83 (1999) 1471–1474.
  • (9) S. Bialonski, M. T. Horstmann, K. Lehnertz, From brain to earth and climate systems: Small-world interaction networks or not?, Chaos: An Interdisciplinary Journal of Nonlinear Science 20 (1) (2010) 013134.
  • (10) V. M. Eguiluz, D. R. Chialvo, G. A. Cecchi, M. Baliki, A. V. Apkarian, Scale-free brain functional networks, Physical review letters 94 (1) (2005) 018102.
  • (11) J. Schiefer, A. Niederbuhl, V. Pernice, C. Lennartz, J. Hennig, P. LeVan, S. Rotter, From correlation to causation: Estimating effective connectivity from zero-lag covariances of brain signals, PLoS computational biology 14 (3) (2018) e1006056.
  • (12) J. Friedman, T. Hastie, R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics 9 (3) (2008) 432-441.
  • (13) A. L. Barabasi, R. Albert, Emergence of scaling in random networks, science 286 (5439) (1999) 509–512.
  • (14)

    A. Blum, J. Hopcroft, R. Kannan, Foundations of data science, Cambridge University Press, 2020.

  • (15) M. G. Rosenblum, A. S. Pikovsky, J. Kurths, Phase synchronization of chaotic oscillators, Physical review letters 76 (11) (1996) 1804.
  • (16) A. L. Hodgkin, A. F. Huxley, A quantitative description of membrane current and its application to conduction and excitation in nerve, The Journal of physiology 117 (4) (1952) 500.
  • (17) R.Q. Su, Y.C. Lai, X. Wang, Identifying chaotic fitzhugh-nagumo neurons using compressive sensing, Entropy 16 (7) (2014) 3889-3902.
  • (18)

    I. Goodfellow, Y. Bengio, A. Courville, Deep learning, MIT press, 2016.

  • (19) G. Klambauer, T. Unterthiner, A. Mayr, S. Hochreiter, Self-normalizing neural networks, in:Proceedings of the 31st international conference on neural information processing systems, 2017, pp. 972-981.
  • (20) D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014).
  • (21) T. Fujiwara, N. Sakamoto, J. Nonaka, K. Yamamoto, K.L. Ma, et al., A visual analytics framework for reviewing multivariate time-series data with dimensionality reduction, IEEE transactions on visualization and computer graphics 27 (2) (2020) 1601-1611.
  • (22) D. Freedman, R. Pisani, R. Purves, Statistics (international student edition), Pisani, R. Purves, 4th edn. WW Norton and Company, New York (2007).
  • (23) O. I. Moskalenko, A. A. Koronovskii, A. E. Hramov, S. Boccaletti, Generalized synchronization in mutually coupled oscillators and complex networks, Physical Review E 86 (3) (2012) 036216.
  • (24) G. Tirabassi, R. Sevilla-Escoboza, J. M. Buldu, C. Masoller, Inferring the connectivity of coupled oscillators from time-series statistical similarity analysis, Scientific reports 5 (1) (2015) 1–14.
  • (25) C. Wissler, The spearman correlation formula, Science 22 (558) (1905) 309-311.
  • (26) B. Liu, X. Huang, Y. Wang, X. Chen, X. Gao, Beta: A large benchmark database towards svepbci application, Frontiers in neuroscience 14 (2020) 627.