1 Introduction
Time series data is the data with the predefined time or sequencial order [1] and is widely used in multifarious realworld applications, such as signal processing [2], economics [3], control theory [4], etc. Typical tasks for time series data include indexing, clustering, classification, and regression [5]. Among the time series tasks, the causal reasoning task is at the top level of cognitive reasoning [6]
and is getting rid of the nature of the correlation fitting for traditional statistical machine learning and deep learning, which is nevertheless at the bottom level. Due to the natural implication of the temporal precedence, the timeseries data therein encapsulates both empirical experiences of the trends, but also the prior knowledge of causalities between different channels
[7].Granger [8]
first used the statistical hypothesis test to decide whether one time series channel is useful to predict another, which is known as
Granger causality (GC) and widely used in various applications. Lynggarrd and Walther [9] proposed the dynamic interaction models based on the classical ‘LWF Markov property’ for chain graphs [10, 11]. Pearl and Robins [12] put forward the ‘back door’ temporal causal conditions and extend the traditional Granger causality to the temporal sequences. Dahlhaus and Eicher [13] discussed an alternative approach that defined the graph according to the ‘AMP Markov property’ of [14]. Eichler [15] adopted the mixed graph constraints, derived from ordinary time series, and used a single vertex and directed edges to represent the component series and the causal relationship respectively.The traditional Granger causality methods are limited to the predictive capability of the autoregressive model (AR) which only used the linear models. Eichler
[7] redescribed the problems when transferring the Granger causality to the nonlinear version: 1) the aggregation of the timevarying coefficients overtime required by Granger causality tests, and 2) the instability of the causal structure. Chen et al. [16] used a delay embedding to get an extended nonlinear version of Granger Causality. Later, Sun [17] proposed to use the RKHS kernel embeddings to get the nonlinearity.The aforementioned works assume that the Granger causalities remain unchanged between time series channels throughout time [18], which is called “channellevel Granger causality”. In the real world, the time series data is becoming massive, complicated and uncertain, and the causality relations would change along with the sliding windows of the time series data (see Fig. 1 as an example).
According to the philosopher David Hume, “cause and effect have temporal precedence[19]”. This acquiesced that causality or precedence itself is in relation with, and even change dynamically with time. In this sense, the constant causality assumption[18] is not always true and the purpose of this assumption for the traditional Granger causality is to take sufficient long (but definitely not necessary) time series in order to distinguish the random correlation and causalities [18]
. Some attempts on dynamic causalities have already been done in neural science: the “dynamic causal modeling” (DCM) method detects the realtime dynamic causal relationships among neuron clusters in the brain. However, it’s by interfering the “inputstateouput” framework in brain and observing its response and cannot directly extract causalities without the model that have relationships like brain
[20].1.1 Our Proposal
In this paper, we present the dynamic windowlevel Granger causality method (DWGC) for time series data. To capture the dynamic causality relations, we relax the sufficiently long time series assumption and build the causality model on the sliding windows of the time series. To capture the nonlinearities in the time series forecasting, we use nonlinear autoregressive model to fit the time series and extract the nonlinear features. Based on the NAR model predictions, an Ftest is used by comparing the prediction results of NAR models on the sliding windows. Further, to reduce the possible fluctuations of autocorrelation on windowlevel Ftest, we introduce a causalityindex matrix and optimize the corresponding causality indexing loss accuracy. We theoretically prove that: 1) the traditional Granger causality is a special case of our dynamic windowlevel Granger method; 2) the dynamic windowlevel Granger method outperforms the traditional Granger causality with the causality indexing.
In the experiments, we implement our DWGC model on three datasets, two synthetic data, and one realworld meteorological data (with prior knowledge). For the synthetic data, we use AR/NAR simulator to generate several synthetic time series. For the realworld meteorological dataset, we examine the obtained dynamic causalities between the ElNino values and the East Asian monsoon over the seasonal cycle which was already examined in the previous literature.
In summary, the contributions of this paper are as follows:

We propose and solve a new task: identify dynamic windowlevel causality of multichannel time series.

We theoretically show that the proposed dynamic windowlevel Granger causality model contains the traditional Granger causality as a special case and is more accurate with the causality indexing.

We conduct the numerical experiments on two synthetic and one realworld datasets and the proposed dynamic windowlevel Granger method shows that this method can find the accurate window causalities.
2 Related Works
2.1 Preliminaries of Granger Causality
Granger causality is the most widely used causality analysis method for time series data [8]
, which is also the main focus of this paper. We use the following sequence to represent the stationary sequence of random variables, i.e, the time series data
:(1) 
where and are the series data at and before time . Granger causality defines as the cause of , if the series provides useful information when predicting the future values of series :
(2) 
where is the accuracy expectation of the prediction function on time series. Besides, the premise of (2) is that channels i and j meet the “backdoor condition[21]”, that is, there is no common interference from confounding factors of other channel. Also, it’s worth pointing out that Granger causality is still a type of statistical association as it does not go through the necessary causal identification process of [6].
Traditional Granger causality methods use a linear autoregressive model for . In this paper, we use the doubleheaded arrow to express the Granger causality to :
(3) 
Granger causality is considered as a “precedence” according to the informal fallacy “Post hoc ergo propter hoc”. This Latin fallacy means “after this, therefore because of this”. It shows the causality as the precedence that the followup event is caused by the previous event.
The linear Granger causality can be extended to the nonlinear version for better fitting nonlinear sequences. In [22]
, a nonlinear prediction model, such as the multilayer perceptron, is defined as
. Then, we can say , if for all :(4) 
is represented as MLP(Multilayer Perceptron)/LSTM(Long ShortTerm Memory) network in
[22]. Both linear and nonlinear Granger causality method assumes that the causalities between time series are constant and cannot model the dynamic causalities that lie in the real world.2.2 Improvements of Granger method to generalize to the nonlinearity/window level
More recently, the deep neural networks are used to get nonlinear Granger causalities
[23, 24, 25, 26, 22, 27]. In these previous works, the neural network is used to replace the original AR model for sequence fitting and prediction, and then causal reasoning methods (such as counterfactual principle, Granger method) are brought into the framework of neural network.On the other hand, Granger’s approach has been extended from channel level to window level. Sezen Cekic[28] use KL(KullbackLiebler) divergence instead of Ftest to extend the windowlevel Granger method, first in neuroscience.
To both deal with nonlinearity and windowlevel challenges, sliding window method is common, which is also applicable in Granger causality situation. The neuroscience Granger causality extended to the nonstationary case by wavelet transforms or multitapers on sliding window by Mattia F.Pagnotta[29], whereas as far as we know, there is no related method that can address both the nonlinear and the windowlevel problems in a wider application scenario.
2.3 Causal Graph for Time Series
Besides the Granger causality, the causal graph is another way to identify the causality between different channels of time series [25]
. The causal graph model can be built based on vector autoregressive Granger analysis(VAR), the generalized additive models(GAMs)
[30] or the certain preassumed regression models [31]. However, the causal graph is limited to the structure itself to extend to the dynamic windowlevel for time series data.2.4 Distinction between causal effects and noise
In each window, it is a subtle topic to distinguish whether the time series trend variations are due to the causal effects or random noise. On the one hand, commonlyused time series anomaly detection methods barely consider causal effects when detecting different abnormal noises. In
[32], although an attempt is made to use anomaly detection to divide the actual sequence into two parts: steadystate structure part and causal influence part, the second part does not make a practical distinction between noise and causality. On the other hand, the traditional Granger causality method keeps the ignorance of noise intervention in most cases.3 Dynamic Windowlevel Granger Causality
3.1 Naive Dynamic Windowlevel Granger Causality Model
To detect dynamic windowlevel causality between two data series and , we first use a sliding window of length on the same time position of the two series:
(5) 
Aiming at finding the dynamic causality at the window level, we consider two forms of timeseries fitting on each sliding window : predicting the future values of one series with and without the information from the other series channel, which are similarly used in the traditional Granger causality method. Compared to equation (2), we use nonlinear autoregressive(NAR) model, and use mean square error(MSE)[33] as to measure two accuracies:
(6)  
(7) 
where means prediction on the sliding window . We determined the existence of causality by setting a reasonable threshold based on the value of :
(8) 
The causality exists between the series and on the sliding window if is larger than the predefined threshold .
3.2 Dynamic Windowlevel Causality Model
In the naive version of the dynamic windowlevel causality method, autocorrelation would easily occur with the disturbances of the NAR model along the time series. To address this problem, we introduce the causality indexing matrix in our method and then decompose the causal effects and autoregressive correlations. The original time series includes two factors: the crosscorrelation and the autocorrelation. By converting to windowslevel, the autocorrelation becomes more unstable and conceals the causal effects in the crosscorrelation. The autoregression correlations in one single series is represented in in Eqn. (6), intuitively, when autocorrelation is unusually large on a local window, it will make the windowlevel ftest result higher than the normal value, which may affect the overall accuracy of the model, or affect the overall recall rate conversely. We use a scale function to scale down the autocorrelation and adopt a corresponding causal indexing matrix , to measure the likelihood that each time point will serve as a starting or ending point for causality. The indexing loss as
(9) 
where are the prediction results and the real series respectively, is the channel index and is the starting point of each sliding window with length . By optimizing this loss, can be used to scale down the original series data with large autocorrelations as:
(10) 
where is Hadamard product. With the causality indexing , we scale down the autocorrelation and get the reweighted series . In this paper, we use the following scaling function:
(11) 
It is worth mentioning that the selection of the scaling function can be further improved by certain regularization item and this will be analyzed in the Appendix C.
We test the dynamic windowlevel causality using the causality indexing as:
(12) 
We further obtain the starting and ending points in the sliding window by finding the maximum of the index matrix:
(13)  
(14) 
In our dynamic windowlevel Granger causality (DWGC) method, we alternatively optimize the NAR forecasting model and the causality index loss Eqn. (9
) and extract causalities after the two loss function converge. The final procedure is outlined in Algorithm
1.4 Theoretical Analysis of DWGC
In this section, we give the theoretical analysis of the proposed dynamic windowlevel Granger causality method(DWGC). We first give the formulation preparation and then prove that 1) DWGC without causality indexing is a special case of the traditional Granger causality method and 2) DWGC with causality is more accurate than the traditional Granger causality method for those causality pairs.
We consider the sample pairs whose expectation of is larger than the predefined threshold . These samples are expected to be tested as causal in the traditional Granger causality method.
4.1 Formulation Preparation
The time series observations can be decomposed into two parts: real data and Gaussian noise. For simpilicity, We use the standard Gaussian for the random noise and get the following decomposition of the time series :
(15) 
We denote the model predictions of the NAR model with/without another causality source series channel as and .
Then the Fstatistic of Eqn. (8), i.e, the ratio between the MSE of the NAR models with/without the channel , can be turned to
(16)  
The square sum of the standard normal noise
follows the chisquare distribution
, where is the length of the sliding window. When the window length is reasonably large, the third term in the above formulation can be omitted. The details of the omitting derivation can be seen in Appendix (A).4.2 DWGC without causality indexing degenerates to Traditional Granger causality
We analyze the distribution of the Fstatistic of the naive DWGC and give the following result.
Theorem 1.
For the time series sliding windows with causalities, the probability of Fstatistic of naive DWGC larger than the threshold
is a monotone increasing function for in the case where is sufficiently large.We prove the above theorem using the series expansion of on each , and leave the details in Appendix B.
This theorem can be intuitively analyzed from the expression of the Fstatistic in Eqn.(16): for numerator, , for denominator, , so when k is sufficiently large, Eqn.(16) can be approximate to .
Without the causal indexing, the larger the sliding window length is, the more accurate the DWGC method without the causality indexing would be. When goes to infinity, the DWGC method without the causality indexing degenerate to the traditional causality method.
Fig. 2(left) is the scatter heatmap of Fstatistic of the naive DWGC method. The xaxis is the sliding window length and the yaxis is the is the value of the Fstatistic. This figure is done based on an experiment for a synthetic ARsimulation data with causalities, which will be further illustrated in the experiment section. As can be seen from the results, the expectation of Fstatistic of naive DWGC is monotone increasing for the window length .
4.3 DWGC with causality indexing generate more accurate causality result
In this section, we give the sufficient condition of to improve our DWGC method.
Theorem 2.
Certain causalityindexing exists to improve the accuracy of our DWGC causality result on each window length k.
We prove the above theorem by adopting the series expansion of in (1), and take the increasing of each series after adding as a sufficient condition to improve our DWGC effect.
Existence of can be intuitively illustrated as follows. The important reason why the Eqn. (16) is unstable before adding is that , and are all independently distributed Gaussian variables, so it gives a unstable influence on . However, with the causal indexing , an causal reweighting method is to assign specific weights to establish correlation between each item by . For example, when is observed to be significantly out of the normal range, we give to both scale down the fitting error and noise, so as to offset the negative effect of abnormal fitting values on time point t on the whole .
Besides, Eqn. (9) give another view to explain the existence of . During the experiment, we can include this theoretical sufficient condition into the regularization term of loss function(9) to help iterative optimization. In a word, our DWGC method’s effect can be improved by satisfying certain condition. Concrete form of this sufficient condition is shown in Appendix C.
5 Experiments
In this section, we present the empirical result comparisons of the dynamic windowlevel causality method.
5.1 Results on Synthetic AR/NAR Simulation Dataset
We first construct the dataset using AR and NAR simulations. The construction details are as follows: 1) The linear AR simulation construction: We simulate two linear AR time series with a random lag value randomly picked from one to nine. The initial value of is i=1,2.
(17)  
2) The nonlinear AR simulation construction: we simulate two nonlinear AR time series with a random lag value randomly picked from one to nine. The initial value of is i=1,2.
(18)  
where , , and takes the real part of the square root.
5.1.1 Experimental Results
A fragment of the AR/NAR simulation data is shown in Fig. 3 (left: AR and right: NAR). The arrows means the lag relation between the two series and we consider them as the ground truth of the dynamic causalities.
For the two AR/NAR simulation datasets, we first preprocess the data to make sure that the time series are stationary. The reweight function for causaling index (Eqn. (11)) as , . Step length is taken as the window length.
dataset  methodwindow length  10  20  30  100 

AR simulations  naive DWGC  0.48(0.06)  0.49(0.05)  0.77(0.03)  0.77(0.02) 
DWGC(ours)  0.72(0.03)  0.72(0.05)  0.80(0.03)  0.88(0.05)  
NAR simulations  naive DWGC  0.59(0.06)  0.60(0.06)  0.73(0.06)  0.87(0.03) 
DWGC(ours)  0.84(0.05)  0.90(0.02)  0.84(0.08)  0.88(0.02) 
dataset  methodwindow length  10  20  30  100 

AR simulations  naive DWGC  0.22(0.06)  0.43(0.05)  0.82(0.03)  0.90(0.02) 
DWGC(ours)  0.23(0.03)  0.45(0.05)  0.84(0.03)  0.92(0.05)  
NAR simulations  naive DWGC  0.42(0.07)  0.76(0.09)  0.93(0.06)  1.00(0.04) 
DWGC(ours)  0.44(0.05)  0.79(0.04)  0.94(0.05)  1.00(0.03) 
On each window, if we detect with at least a set of causality pairs in this window, our causality extraction on this window is successful. In this case, we calculate the accuracy/recall of the naive DWGC and DWGC methods on the data points with causalities in Table 1. As can be seen from the results, for the recall/accuracy rate, DWGC method performs better than naive DWGC method when the sliding window length is generally small. Besides, The recall rate of naive DWGC method increases with the sliding window length, which certifies the theoretical results.
5.2 Results on Climate Dataset for ElNiño
In this part, we verify the DWGC model on a real climate dataset, which has widely recognized seasonal causalities.
5.2.1 Academia Knowledge of ElNiño and monsoon
The climate academia already have the following definition on ElNiño: a climate phenomenon in the Pacific equatorial belt where the ocean and atmosphere interact with each other and lose their balance [34]. While monsoon generally refers to the seasonal conversion of atmospheric circulation and precipitation in tropical and subtropical areas, and two parameters are used to measure its strength: OLR(Outgoing Longwave Radiation) and MKE(Monsoon Kinetic Energy).
The causal interaction between ENSO and the east Asian monsoon has been extensively explored:

The causality and exists, especially in autumn and winter.

The reverse causal effect and also exists in spring and summer, the strength change of the Asian monsoon will in turn trigger the formation of ENSO event.
However, in recent decades, the accuracy of this ENSObased causal model seems to have a downward trend. Therefore, the DWGC method would be helpful for analyzing the data with dynamic causalities like this.
5.2.2 Experiment
We used ENSO and Asian monsoon data in 1992 and the series trends can be seen from the part of the raw data in Fig. 4. We take the first four months as training data and the rest as the testing data [35]. For the comparison with prior knowledge, we selected the window length as one month.
We use our model(DWGC) to judge the windowlevel causalities between ENSO and Asian monsoon in every month. In Fig. 5, we show the Fstatistic values of every month of DWGC and naive DWGC without causality indexing for two series (ENSO,MKE) and (ENSO, OLR). The xaxis is the monthtime and the yaxis is the logscale Fstatistic value. The black dotdash line with value zero is the Fstatistic threshold. For naive DWGC without causality indexing, the causality between ENSO and two parameters of east Asian monsoon(MKE,OLR) can be successfully detected, but the difference of causality between MayAug(spring and summer) and AugDec(autumn and winter) is not significant. However, in our DWGC method, after detecting the basic causal relationship of ENSO MKE and ENSO OLR, we further find that the causal relationship in autumn and winter is more significant than that in spring and summer in both of them with the larger Fstatistic values. This significant causality variation detected by our DWGC is consistent with the academic knowledge[36].
In Fig. 6, we show the Fstatistic values of every month of DWGC and naive DWGC for two series (MKE,ENSO) and (OLR,ENSO), with the same figure axis and Fstatistics threshold of Fig. 5. For naive DWGC without causality indexing, we detect the causality in SepDec(autumn and winter) and the causality in MayJul(spring and summer) and OctDec(autum and winter). However, in our DWGC method, we detect the causality and in MayAug(spring and summer). Compared to the naive DWGC method, our causalities results of DWGC are more close to the academia knowledge[37].
6 Conclusions and Future Work
In this paper, a new task is proposed to detect the windowlevel dynamic causal relationship between the time series data. By directly conducting the Ftest on comparing the window level forecasting predictions with/without the cause channel, the naive DWGC can detect the windowlevel causalities. This naive DWGC is a special case of the traditional Granger method and is not accurate enough especially when the sliding window length is not large enough. We introduce a technique called “causal indexing” to reweight the original time series. The purpose of this technique is to decrease the effects of the autocorrelation noise and increase the crosscorrelation causal effects. The improved DWGC method is proved to have better causality detection accuracies. causalities detection accuracies. As far as we know, this paper is the first to propose and solve the new task of dynamic Granger causal relationship detection at the window level. In the experiments on two synthetic and one real datasets, we show that our DWGC method outperforms the traditional GC and the naive DWGC in sense of the causality detection accuracies.
The dynamic causalities detected by DWGC is restricted to the same sliding window of every time series. In the future, we are interested in detecting the dynamic causalities without prior knowledge of the sliding window length and between the different sliding windows, and further analyzing the effect of step length for the dynamic causality detection.
Acknowledgement
This work is done when the authors were working at RealAI.
References
 [1] James D Hamilton. Time series analysis, volume 2. Princeton New Jersey, 1994.
 [2] Louis L Scharf. Statistical signal processing, volume 98. AddisonWesley Reading, MA, 1991.
 [3] Clive William John Granger and Paul Newbold. Forecasting economic time series. Academic Press, 2014.
 [4] George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. Time series analysis: forecasting and control. John Wiley & Sons, 2015.
 [5] Eamonn Keogh and Shruti Kasetty. On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining and knowledge discovery, 7(4):349–371, 2003.
 [6] Judea Pearl. Theoretical impediments to machine learning with seven sparks from the causal revolution. arXiv preprint arXiv:1801.04016, 2018.
 [7] Michael Eichler. Causal inference with multiple time series: principles and problems. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371(1997):20110613, 2013.
 [8] Clive WJ Granger. Investigating causal relations by econometric models and crossspectral methods. Econometrica: journal of the Econometric Society, pages 424–438, 1969.
 [9] Helle Lynggarrd and Kirsten Honoré Walther. Dynamic Modelling with Mixed Graphical Association Models: Master’s Thesis. Aalborg University, Institute for Electronic Systems, Department of …, 1993.
 [10] Steffen Lilholt Lauritzen and Nanny Wermuth. Graphical models for associations between variables, some of which are qualitative and some quantitative. The annals of Statistics, pages 31–57, 1989.
 [11] Morten Frydenberg. The chain graph markov property. Scandinavian Journal of Statistics, pages 333–353, 1990.
 [12] Judea Pearl and James M Robins. Probabilistic evaluation of sequential plans from causal models with hidden variables. In UAI, volume 95, pages 444–453. Citeseer, 1995.
 [13] Rainer Dahlhaus and Michael Eichler. Causality and graphical models in time series analysis. Oxford Statistical Science Series, pages 115–137, 2003.
 [14] Steen A Andersson, David Madigan, and Michael D Perlman. Alternative markov properties for chain graphs. Scandinavian journal of statistics, 28(1):33–85, 2001.
 [15] Michael Eichler. Graphical modelling of multivariate time series. Probability Theory and Related Fields, 153(12):233–268, 2012.
 [16] Yonghong Chen, Govindan Rangarajan, Jianfeng Feng, and Mingzhou Ding. Analyzing multiple nonlinear time series with extended granger causality. Physics Letters A, 324(1):26–35, 2004.
 [17] Xiaohai Sun. Assessing nonlinear granger causality from multivariate time series. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 440–455. Springer, 2008.
 [18] C.W.J. Granger. Testing for causality: A personal viewpoint. Journal of Economic Dynamics and Control, 2:329 – 352, 1980.
 [19] Tom L Beauchamp and Alexander Rosenberg. Hume and the Problem of Causation. Oxford University Press,, 1981.
 [20] K.J. Friston, L. Harrison, and W. Penny. Dynamic causal modelling. NeuroImage, 19(4):1273 – 1302, 2003.

[21]
Judea Pearl and James Robins.
Probabilistic evaluation of sequential plans from causal models with
hidden variables.
In
Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence
, UAI’95, page 444–453, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc.  [22] Alex Tank, Ian Covert, Nicholas Foti, Ali Shojaie, and Emily Fox. Neural granger causality for nonlinear time series. arXiv preprint arXiv:1802.05842, 2018.
 [23] David Alan Jones and David Roxbee Cox. Nonlinear autoregressive processes. Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences, 360(1700):71–95, 1978.
 [24] Aneesh Sreevallabh Chivukula, Jun Li, and Wei Liu. Discovering grangercausal features from deep learning networks. In Australasian Joint Conference on Artificial Intelligence, pages 692–705. Springer, 2018.
 [25] Chenxiao Xu, Hao Huang, and Shinjae Yoo. Scalable causal graph learning through a deep neural network. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 1853–1862. ACM, 2019.
 [26] Andrea Duggento, Maria Guerrisi, and Nicola Toschi. Echo state network models for nonlinear granger causality. bioRxiv, page 651679, 2019.

[27]
Miao He, Weixi Gu, Ying Kong, Lin Zhang, Costas J Spanos, and Khalid M Mosalam.
Causalbg: Causal recurrent neural network for the blood glucose inference with iot platform.
IEEE Internet of Things Journal, 2019.  [28] Sezen Cekic, Didier Grandjean, and Olivier Renaud. Time, frequency, and time‐varying granger‐causality measures in neuroscience. Statistics in Medicine, 37(11):1910–1931, 2018.
 [29] Mattia F. Pagnotta, Mukesh Dhamala, and Gijs Plomp. Benchmarking nonparametric granger causality: Robustness against downsampling and influence of spectral decomposition parameters. NeuroImage, 183:478 – 494, 2018.
 [30] Helmut Lütkepohl. New introduction to multiple time series analysis. Springer Science & Business Media, 2005.
 [31] Linda Sommerlade, Marco Thiel, Bettina Platt, Andrea Plano, Gernot Riedel, Celso Grebogi, Jens Timmer, and Björn Schelter. Inference of granger causal timedependent influences in noisy multivariate time series. Journal of neuroscience methods, 203(1):173–185, 2012.
 [32] Kay H Brodersen, Fabian Gallusser, Jim Koehler, Nicolas Remy, Steven L Scott, et al. Inferring causal impact using bayesian structural timeseries models. The Annals of Applied Statistics, 9(1):247–274, 2015.
 [33] Aneesh Sreevallabh Chivukula, Jun Li, and Wei Liu. Discovering grangercausal features from deep learning networks. In Australasian Joint Conference on Artificial Intelligence, 2018.
 [34] CS Ramage. guillemotright monsoon meteorology. International Geophysics Series, 15, 1971.
 [35] Song Yang, Kaiqiang Deng, and Wansuo Duan. Selective interaction between monsoon and enso: Effects of annual cycle and spring predictability barrier. Chinese Journal of Atmospheric Sciences, 2018.
 [36] K Krishna Kumar, Balaji Rajagopalan, and Mark A Cane. On the weakening relationship between the indian monsoon and enso. Science, 284(5423):2156–2159, 1999.
 [37] T Yasunari. Impact of indian monsoon on the coupled atmosphere/ocean system in the tropical pacific. Meteorology and Atmospheric Physics, 44(14):29–41, 1990.
 [38] J Eamonn Nash and Jonh V Sutcliffe. River flow forecasting through conceptual models part i—a discussion of principles. Journal of hydrology, 10(3):282–290, 1970.
Appendix A The Simplification of Ftest result
Lemma 1.
Proof.
Due to that the sum and product of Gaussian variables still satisfy Gaussian distribution N(0,
)(ignore the systematic error). We have:(20)  
So
(21)  
Its convergence rate is significantly faster than that of the Gaussian variable in the same form(the rate is ), which can be ignored as it tends to zero in most cases. ∎
Appendix B Proof of Theorem 1
b.1 Preliminary
Nash efficiency coefficient[38] is often used to evaluate the performance of simulation prediction, which is expressed as:
if out model’s Nash efficiency coefficient is stable, that is: , we can get:
b.2
Proof.
(22) 
Let’s call each series , so . We recursively prove that is a monotonic function with respect to k(when k is greater than a certain value,):
(23)  
For each , if we can find , , we have
(24) 
We can get , so
(25) 
Here comes to the conclusion:
(26) 
∎
Also, we can give a simpler proof:
Proof.
(27)  
Let’s take the derivative of the next term:
(28)  
Therefore, equation (27) shows a single increasing trend with the increase of window length k. ∎
Appendix C Proof of Theorem 2
Lemma 2.
If
are k independent, circular symmetric complex Gaussian random variables with mean 0 and variance
, we have:(29) 
Theorem 3.
for each k, consider the window , the sufficient condition for is:
(30) 
Proof.
The can be represented as follows:
(31) 
According to (31), two influencing factorsprediction error and noisehave changed. Using lemma 2, assume , we do the series expansion , so the sufficient condition for is:
(32) 
Comparing the series expansion of naive DWGC and DWGC, the sufficient condition of (32) is:
(33) 
The sufficient condition can be converted to: :
(34) 
(35) 
(36) 
Comments
There are no comments yet.