Learning Latent Fractional dynamics with Unknown Unknowns

11/02/2018 ∙ by Gaurav Gupta, et al. ∙ University of Southern California 0

Despite significant effort in understanding complex systems (CS), we lack a theory for modeling, inference, analysis and efficient control of time-varying complex networks (TVCNs) in uncertain environments. From brain activity dynamics to microbiome, and even chromatin interactions within the genome architecture, many such TVCNs exhibits a pronounced spatio-temporal fractality. Moreover, for many TVCNs only limited information (e.g., few variables) is accessible for modeling, which hampers the capabilities of analytical tools to uncover the true degrees of freedom and infer the CS model, the hidden states and their parameters. Another fundamental limitation is that of understanding and unveiling of unknown drivers of the dynamics that could sporadically excite the network in ways that straightforward modeling does not work due to our inability to model non-stationary processes. Towards addressing these challenges, in this paper, we consider the problem of learning the fractional dynamical complex networks under unknown unknowns (i.e., hidden drivers) and partial observability (i.e., only partial data is available). More precisely, we consider a generalized modeling approach of TVCNs consisting of discrete-time fractional dynamical equations and propose an iterative framework to determine the network parameterization and predict the state of the system. We showcase the performance of the proposed framework in the context of task classification using real electroencephalogram data.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Time-varying complex networks (TVCNs) provides a comprehensive mathematical framework for modeling complex biological, social and technological systems such as the brain dynamic activity [1, 2, 3], the gene expression and interactions [4, 5], the bacteria dynamics [6, 7, 8], or the swarm robotics [9, 10]

. Many such time varying complex (biological) networks exhibit complex spatio-temporal interactions. For instance, the short- and long-range interactions among neurons contribute to the emergence of long-range memory and fractional dynamics at macroscopic brain regions. Moreover, the non-stationarity which arises in most of the bio-physical processes require modeling techniques supporting interaction among variables in space and time. A computationally efficient strategy for constructing compact yet accurate mathematical models of TVCNs relies on describing the self-activity of nodes in TVCNs through fractional order operators

[11, 12, 13, 14, 15].

The modeling of interactions across nodes require assumption of complete knowledge of the associated complex network (CN). However, due to the experimental limitations, (e.g., resource constraint, limiting probing capabilities111As Wigner argued once: ‘It is the skill and ingenuity of the experimenter which show him phenomena which depend on a relatively narrow set of relatively easily realizable and reproducible conditions’ [16].), only a part of the complete CN is available at most of the times. The influence of so-called latent nodes on the observed nodes can be captured by noise, which may not be a good approach as these latent nodes activities have specific patterns. For example, a sensor capturing brain region activity (known to be fractional dynamic), going to be relaxed in the future, and thereby making it latent. In the current work, we are concerned with fractional dynamical model partly because of their celebrated success in modeling several physiological signals like electroencephalogram (EEG), electrocardiogram (ECG), electromyogram (EMG), and blood oxygenation level dependent (BOLD) [17, 18]. In experimental setups, for example the brain, the recorded data is influenced by subcortical regions which are not probed. Instead of labeling them as unknown inputs to the observed system, it can be realized that they also obey the fractional dynamical patterns. In other situations, it is often required to relax some sensors, due to power limitations or sensor failure. In all such conditions, we have only partial information regarding the complete TVCN, and we wish to better predict the observed data in the presence of latent fractional dynamics as well as unknown drivers.

There is rich literature regarding study of networks with latent nodes. To list a few, the importance of realizing the inclusion of latent nodes in the context of linear time invariant (LTI) systems has been explored in [19, 20, 21]

, for Bayesian networks in


, and graphical models with Gaussian distribution of nodes in

[23], complex systems [24]. The Markovian assumption helps in the case of LTI systems, and in some sense to uncover the latent nodes. However, the LTI systems are not sufficient to accurately model physiological signals, such as EEG, ECG and BOLD (just to mention a few) due to their inability in capturing the long-range memory property of the biological signals. To the best of authors’ knowledge, the latent node framework is still non-existent in the context of discrete-time fractional dynamical systems. The closest work concerning fractional-order systems are [1] and [2], where complete knowledge of the CN is assumed. The work in [2] has generalized the work in [1], and included the contribution of unknown drivers in the system, but also with the assumption of complete knowledge of the nodes in the TVCN. The unknown unknowns were designated as external activities which does not obey the fractional dynamics of the CN.

Our contributions in this work are as follows: (i) we present a TVCN having fractional dynamics with latent nodes; (ii) jointly estimate the fractional latent node activities, and unknown drivers which does not obey fractional dynamics from observed data; (iii) iteratively estimate the complete model (latent

observed) parameters in the likelihood sense.

The rest of the paper is organized as follows. Section II introduces the system model with latent nodes considered in this paper, and then formally presents the main problem statement. In Section III, we provide solutions to the considered problems as the main results, and in Section IV, we evaluate the proposed methods on simulated and real-world datasets. Finally, we conclude the work and present future directions in Section V. The proofs are presented in the Appendix.

Ii Problem Formulation

We begin by introducing in Section II-A the TVCN system model obeying fractional-order dynamical evolution with latent nodes and unknown excitations. Next, in Section II-B, we describe the main problem addressed in this paper.

Ii-a System Model

We consider a time-varying complex network (TVCN) described by a linear discrete-time fractional-order model with latent nodes, which can be mathematically written as


where are the observed state variables, are the latent states, and are the unknown excitations. The system matrices are of appropriate dimensions. The noise variables are assumed to be uncorrelated across observed and latent nodes with and . The fractional-order derivative in equation (1) obeys the discrete form for every node, either observed or latent, as follows [25]:


where the matrices and with , and denotes the gamma function. The fractional-order coefficients corresponding to the th node of observed and latent variables are denoted by and , respectively.

Ii-B System Identification with Latent nodes

The physiological signals, e.g. EEG and ECG, display spatio-temporal behavior, where the temporal component shows long-range memory dependence as realized in [1, 2]. As a consequence, properly modeled by the systems described in Section II-A. When trying to obtain the system representation, i.e., to identify the system’s parameters, the model estimation is contingent on the complete knowledge of the assumed complex-network dynamics; specifically, their nodes’ activities in terms of time-series. Such assumptions may not hold in most of the cases where we are provided only with partial data. In such cases, to better predict the complex systems dynamics, it is beneficial to incorporate latent nodes. In this regard, the problem considered in this work can be stated as follows.

Problem Statement: Given the partial data (observed) in terms of time-series across a time-horizon , and knowledge of the fractional orders of latent nodes . Estimate the model parameters and latent states , and the unknown inputs .

This problem will build upon the models in [2, 1], but with striking difference of availability of only the partial data, and presence of latent nodes in the model. In contrast to [2], this work also relax the assumption of knowledge of the input matrices, and they will be computed as part of the system’s parameters. Notice that we are implicitly assuming that the fractional-order coefficients are constant over time since these have been shown to be empirically slowly time-varying. We will see in the Section IV that by considering latent nodes we can improve the prediction accuracy of the observed data. In the next section, we detail the assumptions required, and solution to this estimation problem.

Iii Latent Model Estimation

Due to notational convenience, we denote as

. The model estimation procedure begins with, first making an estimate of the latent node activities with assumptions that some approximation of the system’s parameters are known. A fractional Kalman filtering approach similar to

[26] under Bayesian approximation is used for this purpose in Section III-A. Subsequently, we will use the new data (estimated latent and the available data) to perform the system identification. In Section III-B, we present an iterative algorithm to jointly estimate the system’s parameters and the unknown unknowns.

Iii-a Fractional Kalman filtering

The fractional-order Kalman filtering aims at estimation of the latent states at each th time step with the available data. Using standard notations in the Kalman filtering (for the linear systems), we define the following estimates


Fig. 1: Bayesian network assumption for fractional Kalman filtering.

In contrast to the Kalman filtering for classical linear system, we have the ’s conditioned on the observed data from till . The reasoning behind this can be quickly seen in the definition of system model in equation (1). In the classical linear system, the observations and latent activities are indexed at the same time. While in the considered system model, with the observations being and latent nodes being , we can witness that the equation (1) relate the latent node activity with observations till . The Kalman filtering solutions can be mathematically intractable due to the complexities introduced by long-range dependence of the fractional operator . We will resort to the Bayesian network assumption (as depicted in Figure 1) for the latent state estimates which is described in the following lemma.

Lemma 1.

The Fractional-order Kalman filtering solution for the system described in (1) with the Bayesian Network assumption of Figure 1 is written as

where the conditional covariances are defined as and .

Next, we present an algorithm to determine the system’s parameters attaining maximum likelihood estimation.

Iii-B Maximum Likelihood Estimation

The MLE estimate of the system’s parameters has to be performed in the presence of latent variables. We propose to use an Expectation-Maximization (EM) like algorithm. The conditional distributions of the latent fractional nodes considered are as described in the Section 

III-A. Moreover, we have unknown unknowns in our system , and this work in contrast to [2] will make an estimation of as well as the input matrices . We will define the EM algorithm to make an estimate of the latent fractional nodes activities and unknown unknowns jointly.

The EM update of the system’s parameters at each iteration is performed via the following result.

Theorem 1.

An update of the system parameters used in (1) with given and the initial conditions , at each iteration index is (4), (5) and


where and .

Now, using these results, we can write the iterative algorithm for system’s parameter estimation. Intuitively, the algorithm starts with an initial guess of the system parameters, and then at each step, obtain the latent node activities using the approach described in Section III-A. Further, upon estimating the incurred error through unknown unknowns, an update of the system’s parameters is obtained and the process is repeated until convergence. The mentioned steps are formally detailed as Algorithm 1.

Input: and
Output: , and
Initialize . For , set and repeat
        ‘E-step’ (i) For obtain from Lemma 1 using and ;
        (ii) For obtain as
        (ii) ‘M-step’:
        obtain from Theorem 1;
until until converge;
Algorithm 1 EM algorithm

The proof of correctness of the Algorithm 1 follows from the Theorem 1, and further details of formulation can be found in the Appendix. In the next section, we discuss some applications of the proposed algorithm on variety of data.

Iv Experiments

We evaluate the performance of the proposed approach on a variety of datasets. First, we demonstrate on an artificially generated fractional-order time-series (see Section IV-A). Next, we use real-world physiological signals available in the form of -channel EEG time-series (see Section IV-B). The -channel electrode distribution is shown in Figure 2. The subjects were asked to perform various motor (actual and imagery) tasks, for example, left/right hand or feet, both hand or feet. The data was collected by BCI system with sampling rate of Hz [27, 28].

used sensors
Fig. 2: EEG Sensor montage for the collection of the measured data (i.e., the channels). The channel labels are shown with their corresponding number.

Iv-a Simulated Data

We consider a pedagogical fractional-order system with three nodes, and without unknown inputs with the following parameters:

For studying the latent node behavior, we remove one node and observe the rest, and , from which we use to recover the time-series generated by the accessible nodes (i.e., the nodes that are not latent) using our proposed method (i.e., with latent variables) versus those previously used in the literature [2] (i.e., without latent variables). The five-step prediction error results are summarized in Table I, where the relative error values are computed as

Observed without latent with latent
2 13.40% 7.94%
1 22.12% 21.12%
1 11.44 % 8.37%
TABLE I: A comparison of mean squared prediction error with and without using latent node model for various possibilities of observed nodes. The th row corresponds to making node  as latent.
Fig. 3: Simulated activities for all three nodes generated by selected model parameters.

where is the predicted value of the th node at time . The error percentage is consistently high for node , and the reason for this lies in the actual behavior of the node activity as seen in Figure 3. Specifically, node  activity –unlike other two nodes– stays very close to zero and vary frequently, which makes it difficult to use for accurate predictions using the proposed model. Next, we see the application on real-world EEG dataset.

Observed (112) + 13 1314 1315 1316 1317 1318 1319 1320
Hidden 1321 1421 1521 1621 1721 1821 1921 2021 21
Without latent (in %) 10.51 10.44 10.97 11.71 13.51 13.77 15.29 16.03 14.56
With latent (in %) 6.07 7.20 6.53 8.49 11.36 10.50 13.18 13.18 13.16
Observed (112) + 56 5657 5658 5659 5660 5661 5662 5663
Hidden 5664 5764 5864 5964 6064 6164 6264 6364 64
Without latent (in %) 10.51 12.09 15.99 13.45 15.29 16.59 20.40 20.89 21.52
With latent (in %) 6.31 7.92 11.27 9.25 9.90 9.69 13.28 19.87 20.07
TABLE II: Average prediction error with and without using latent model for two different set of sensors in (a) and (b). Each column has labeled hidden nodes, and observed nodes are union of and the corresponding column entry. The total number of observed and latent nodes change by and from left to right.

Iv-B Real-world data

The estimation of model’s parameters, such as the coupling matrices , the input matrices , and the latent states, is performed for the EEG data using Algorithm 1. We notice that the choice of initial conditions can play a great role in fast convergence, as well as the accuracy of the results. If some previous knowledge is available (for example, through experiments) about the coupling matrices, then it can be used to achieve better results. In this work, the matrices are initialized to entries selected uniformly at random between and . In the current experiment, we have used the data when the subject has executed ‘both feet movement’. While performing model estimation, the use of relevant EEG sensors are required for having accurate predictions. Therefore, we used neuro-physiological evidence-based sensors that capture the behavior associated with the peripheral nervous system (i.e., the motor cortex) and labeled from to , –see Figure 2. Specifically, different subregions in the motor region are activated when the feet move, so we have used this information to carefully reduce the number of sensors/nodes in our study.

The proposed latent model is tested in a comprehensive manner by performing the following steps: (i) first fixing the nodes to make prediction from sensor IDs to (denoted by ); (ii) second, consecutively reveal new nodes to increase the total observed nodes dimension (originally from ) by one and decrease total latent node dimension (from ) by one, in each step. The reported error values are computed from equation (8), and are averaged across the fixed twelve observed nodes. In Table I(a) we provide some evidence that the latent model with minimal information concerning fractional orders of the latent nodes perform better than without using any model on the unobserved nodes. We also observed the necessity of the relevant latent nodes, by considering the set of sensors which are placed on the region of brain least related to the undertaken situation of ‘both feet movement’. The same experiment is repeated to predict the activities of fixed nodes in consideration and varying the total observed/latent nodes, but this time from a set of sensor IDs . The prediction error values are reported in Table I(b). We notice that the error values are higher upon revealing nodes from the set . This raises an important and intuitive point that, revealing/hiding time-series that have less relation to the data under consideration are very likely to increase inaccuracies in the model.

The experiments, both simulated and with real EEG data, provided evidence that the inclusion of latent model is helpful in the context of the accuracy of the retrieved model and prediction accuracies.

V Conclusion

We have introduced and studied the framework of latent nodes in the TVCN with fractional dynamical model and additional unknown drivers. An iterative solution is proposed which jointly estimate the system’s parameters, the latent node states as they evolve over time, and the unknown unknowns (i.e., the unknown input matrices and inputs). We have shown that the minimal assumption of knowledge of fractional coefficients of the latent node allows us to better explain the observed data. We have applied the proposed concepts on both simulated and experimental data to see the gains in prediction accuracy of the observed data.

Future work will focus on the intriguing role of fractional-order coefficients of the latent nodes. Instead of the entire latent node data, it is observed that the associated fractional-order coefficients are enough for removing the effects of hidden nodes. Moreover, the fractional coefficients can be used to decide the required dimensions, or degrees-of-freedom, of the hidden part to make observed and hidden a complete system.

Appendix A Proof of Lemma 1


With the assumption of initial states and

being normal distributed, and

known, the Kalman filtering is application of the Bayes’ formula.


Let , , and . Upon comparing the terms within equation (9) and using (1), we can write


where using matrix inversion lemma [29]. For , we can write


where follows from equation (1), and is obtained using Bayesian network assumption of Figure 1. To obtain the recursion for covariance matrix , we begin by expressing using (11) and (1) as


Using (12), the covariance matrix is written as


From the Bayesian network assumption of Figure 1, it follows that


Therefore, using (14), equation (13) reduces to


Appendix B Proof of Theorem 1


Let, and the initial conditions be known. Let us also denote the set of variables . The ‘ function’ for the EM like algorithm (described in Algorithm 1) with latent variables being can be written as


where is written using Bayesian assumption of Figure 1, and can be written due to the uncorrelated noise assumption of and as assumed in (1). For notational convenience, we have dropped the constants at each step. The terms inside the summation can be written as


where we have dropped the constants for notational convenience, and are from equation (1). We now proceed to obtain the function which is described as follows: