1. Introduction
Inferring causal relationships from data is a fundamental problem in statistics, economics, and science in general. The gold standard for assessing causal effects is running randomized controlled trials which randomly assign a treatment (e.g., a drug or a specific user interface) to a subset of a population of interest, and randomly select another subset as a control group which is not given the treatment, thus attributing the outcome difference between the two groups to the treatment. However, in many cases, running such trials may be unethical, expensive, or simply impossible (Varian, 2016)
. To address this issue, several methods have been developed to estimate causal effects from observational data
(Pearl, 2000; Spirtes et al., 1993).In the context of time series data, a wellknown method that defines a causal relation in terms of predictability is Granger causality (Granger, 1969). Grangercauses if past information on predicts the behavior of better than ’s past information alone (Arnold et al., 2007). In this work, when we refer to causality, we mean specifically the predictive causality defined by Granger causality. The key assumptions of Granger causality are that 1) the process of effect generation can be explained by a set of structural equations, and 2) the current realization of the effect at any time point is influenced by a set of causes in the past. Similar to other causal inference methods, Granger causality assumes unconfoundedness and that all relevant variables are included in the analysis. There are several studies that have been developed based on Granger causality (Liu et al., 2012; Atukeren et al., 2010; Peters et al., 2013). The typical operational definitions (Atukeren et al., 2010) and inference methods for inferring Granger causality, including the common software implementation packages (MLS, [n. d.]; RSo, [n. d.]), assume that the effect is influenced by the cause with a fixed and constant time delay.
Granger causality has another assumption of linearity of structural equations that causes influence effects. Hence, Transfer Entropy has been developed to be a nonlinear extension of Granger causality (Lee et al., 2012; Barnett et al., 2009). However, the assumption of an effect is fixedlag influenced by the cause still exists in transfer entropy.
This assumption of a fixed and constant time delay between the cause and effect is, in fact, too strong for many applications of understanding natural world and social phenomena. In such domains, data is often in the form of a set of time series and a common question of interest is which time series are the (causal) initiators of patterns of behaviors captured by another set of time series. For example, who are the individuals who influence a group’s direction in collective movement? What are the sectors that influence the stock market dynamics right now? Which part of the brain is critical in activating a response to a given action? In all of these cases, effects follow the causal time series with delays that can vary over time.
To address the remaining gap, we introduce the concept Variablelag Granger causality and Variablelag Transfer Entropy and methods to infer them in time series data. We prove that our definitions and the proposed inference methods can address the arbitrarytimelag influence between cause and effect, while the traditional operationalizations of Granger causality, transfer entropy, and their corresponding inference methods cannot. We show that the traditional definitions are indeed special cases of the new relations we define. We demonstrate the applicability of the newly defined causal inference frameworks by inferring initiators of collective coordinated movement, a problem proposed in (Amornbunchornvej et al., 2018), as well as inferring casual relations in other realworld datasets.
We use Dynamic Time Warping (DTW) (Sakoe and Chiba, 1978) to align the cause to the effect time series while leveraging the power of Granger causality and transfer entropy. In the literature, there are many clusteringbased Granger causality methods that use DTW to cluster time series and perform Granger causality only for time series within the same clusters (Yuan et al., 2016; Peng et al., 2007). Previous work on inferring causal relations using both Granger causality and DTW has the assumption that the smaller warping distance between two time series, the stronger the causal relation is (Sliva et al., 2015). If the minimum distance of elements within the DTW optimal warping path is below a given distance threshold, then the method considers that there is a causal relation between the two time series. However, their work assumes that Granger causality and DTW should run independently.In contrast, our method formalizes the integration of Granger causality and DTW by generalizing the definition of Granger causality itself and using DTW as an instantiation of the optimal alignment requirement of the time series.
In addition to the standard uses of Granger causality, our method is capable of:

Inferring arbitrarylag causal relations: our method can infer Granger causal relation where a cause influences an effect with arbitrary delays that can change dynamically;

Quantifying variablelag emulation: our method can report the similarity of time series patterns between the cause and the delayed effect, for arbitrary delays;
We also prove that when multiple time series cause the behavioral convergence of a set of time series then we can treat the set of these initiating causes in the aggregate and there is a causal relation between this aggregate cause (of the set of initiating time series) and the aggregate of the rest of the time series. We provide many experiments and examples using both simulated and realworld datasets to measure the performance of our approach in various causality settings and discuss the resulting domain insights. Our framework is highly general and can be used to analyze time series from any domain.
2. Related work
Many causal inference methods assume that the data is i.i.d. and rely on knowing a mechanism that generates that data, e.g., expressed through causal graphs or structural equations (Pearl, 2000). In time series data, the values of the consecutive time steps violate the i.i.d. assumption. Another set of causal inference methods relax the strong i.i.d assumption, and instead assume independence between the cause and the mechanism generating the effect (Janzing and Scholkopf, 2010; Schölkopf et al., 2012; Shajarisales et al., 2015). Specifically, knowing the cause never reveals information about the structural function and vice versa. This idea has been used in the context of times series data (Shajarisales et al., 2015) by relying on the concept of Spectral Independence Criterion (SIC). If a cause is a stationary process that generates the effect via linear time invariance filter (mechanism), then and should not contain any information about each other but dependency between them and exists in spectral sense.
Granger causality has inspired a lot of research since its introduction in 1969 (Granger, 1969). Recent work on Granger causality has focused on various generalizations for it, including ones based on information theory, such as transfer entropy (Schreiber, 2000; Shibuya et al., 2009) and directed information graphs (Quinn et al., 2015). Recent inference methods are able to deal with missing data (Iseki et al., 2019)
and enable feature selection
(Sun et al., 2015). Granger causality has even been explored as a method to offer explainability of machine learning models
(Schwab et al., 2019). However, none of them study tests for variablelag Granger causality, as we propose in this work.There is a framework of causal inference in (Malinsky and Spirtes, 2018) based on conditional independence tests on time series generated from some discretetime stochastic processes that allows unknown latent variables. However, the approach in (Malinsky and Spirtes, 2018)
still assumes that data points at any time step have been generated from some structural vector autoregression (SVAR). The recent work in
(GriveauBillion and Calderhead, 2019) models causal relation between time series as a form of polynomial function and uses a stochastic block model to find a causal graph. Both works, however, still have the assumption of fixedlag influence from causes to effects.Besides, no method studies a causal structure that is unstable overtime (Eichler, 2013a). Moreover, Transfer Entropy, which is considered to be a nonlinear extension of Granger causality (Lee et al., 2012; Barnett et al., 2009), still has the fixedlag assumption.
In our work, we also relax the stationary assumption of time series.
3. Extension from previous work
This paper is an extension of our conference proceeding (Amornbunchornvej et al., 2019). According to our previous work (Amornbunchornvej et al., 2019), we formalized a VLGranger causality and proposed a framework to infer a causal relation using BIC and Ftest as main criteria to infer whether causes . However, in this work, we propose to use a Bayesian Information Criterion difference ratio as a main criterion. Hence, we rerun all results and do not use the results from previous paper in (Amornbunchornvej et al., 2019). Moreover, we formalize VariableLag Transfer Entropy and propose a framework to infer its causal relations. We also add two new realworld datasets in this current work.
4. Granger causality and fixed lag limitation
Let be a time series. We will use to denote an element of at time . Given two time series and , it is said that Grangercauses (Granger, 1969) if the information of in the past helps improve the prediction of the behavior of , over ’s past information alone (Arnold et al., 2007). The typical way to operationalize this general definition of Granger causality (Atukeren et al., 2010) is to define it as follows:
Definition 4.1 (Granger causal relation).
Let and be time series, and be a maximum time lag. We define two residuals of regressions of and , , below:
(1) 
(2) 
where and are constants that optimally minimize the residual from the regression. Then Grangercauses
if the variance of
is less than the variance of .This definition assumes that, for all , can be predicted by the fixed linear combination of and with some fixed and every is a fixed constant over time (Atukeren et al., 2010; Arnold et al., 2007). However, in reality, two time series might influence each other with a sequence of arbitrary, nonfixed time lags. For example, Fig. 1(a2.) has as a cause time series and as the effect time series that imitates the values of with arbitrary lags. Because is not affected by with a fixed lags and the linear combination above can change over time, the standard Granger causality tests cannot appropriately infer Grangercausal relation between and even if is just a slightly distorted version of with some lags. For a concrete example, consider a movement context where time series represent trajectories. Two people follow each other if they move in the same trajectory. Assuming the followers follow leaders with a fixed lag means the followers walk lockstep with the leader, which is not the natural way we walk. Imagine two people embarking on a walk. The first starts walking, the second catches up a little later. They may walk together for a bit, then the second stops to tie the shoe and catches up again. The delay between the first and the second person keeps changing, yet there is no question the first sets the course and is the cause of the second’s choices where to go. Fig. 1 illustrates this example.
5. Variablelag Granger Causality
Here, we propose the concept of variablelag Granger causality, VLGranger causality for short, which generalizes the Granger causal relation of Definition 4.1 in a way that addresses the fixedlag limitation. We demonstrate the application of the new causality relation for a specific application of inferring initiators and followers of collective behavior.
Definition 5.1 (Alignment of time series ).
An alignment between two time series and is a sequence of pairs of indices , aligning to , such that for any two pairs in the alignment and , if then (noncrossing condition). The alignment defines a sequence of delays , where and aligns to .
Definition 5.2 (VLGranger causal relation).
Let and be time series, and be a maximum time lag (this is an upper bound on the time lag between any two pairs of time series values to be considered as causal). We define residual of the regression:
(3) 
Here , where is a time delay constant in the optimal alignment sequence of and that minimizes the residual of the regression. The constants , and optimally minimize the residuals , , and , respectively. The terms and can be combined but we keep them separate to clearly denote the difference between the original and proposed VLGranger causality. We say that VLGrangercauses if the variance of is less than the variances of both and .
In order to make Definition 5.2 fully operational for this more general case (and to find the optimal constants values), we need a similarity function between two sequences which will define the optimal alignment. We propose such a similaritybased approach in Definition 5.5. Before defining this approach, we show that VLGranger causality is the proper generalization of the traditional operational definition of Granger causality stated in Definition 4.1. Clearly, if all the delays are constant then .
Proposition 5.3 ().
Let and be time series and be their alignment sequence. If , then .
We must also show that the variance of is no greater than the variance of .
Proposition 5.4 ().
Let and be time series, be their alignment sequence such that . If , such that and , then .
Proof.
Because , by setting for all , we have . In contrast, suppose and , so . Because must be constant for all time step to compute , at time , the regression must choose to match either 1) and or 2) and . Both 1) and 2) options make . Hence, . ∎
According to Propositions 5.3 and 5.4, VLGranger causality is the generalization of the Def. 4.1 and always has lower or equal variance.
Of a particular interest is the case when there is an explicit similarity relation defined over the domain of the input time series. The underlying alignment of VLGranger causality then should incorporate that similarity measure and the methods for inferring the optimal alignment for the given similarity measure.
Definition 5.5 (Variablelag emulation).
Let be a set of time series, , and be a similarity measure between two time series.
For a threshold , if there exists a sequence of numbers s.t. when , then we use the following notation:

if , then emulates , denoted by ,

if , then emulates , denoted by ,

if and , then .
We denote if and .
Note, here the sim similarity function does not have to be a distance function that obeys, among others, a triangle inequality. It can be any function that quantitatively compares the two time series. For example, it may be that when one time series increases the other decreases. We provide a more concrete and realistic example in the application setting below.
Adding this similarity measure to Definition 5.2 allows us to instantiate the notion of the optimal alignment as the one that maximizes the similarity between and :
(4) 
where for any given and . With that addition, if , then VLGrangercauses . This allows us to operationalize VLGranger causality by checking for variablelag emulation, as we describe in the next section.
5.1. Example application: Initiators and followers
In this section, we demonstrate an application of the VLGranger causal relation to finding initiators of collective behavior. The Variablelag emulation concept corresponds to a relation of following in the leadership literature (Amornbunchornvej et al., 2018). That is, if is a follower of . We are interested in the phenomenon of group convergence to a consensus behavior and answering the question of which subset of individuals, if any, initiated that collective consensus behavior. With that in mind, we now define the concept of an initiator and provide a set of subsidiary definitions that allow us to formally show (in Proposition 5.9) that initiators of collective behavior are indeed the time series that VLGrangercause the collective pattern in the set of the time series. In order to do this, we generalize our twotime series definitions to the case of multiple time series by defining the notion of an aggregate time series, which is consistent with previous Granger causality generalizations to multiple time series (Siggiridou and Kugiumtzis, 2016; Eichler, 2013b; Chen et al., 2004).
Definition 5.6 (Initiators).
Let be a set of time series. We say that is a set of initiators if , , , and, conversely, . That is, every time series follows some initiator and every initiator has at least one follower.
Given a set of time series , and a set of time series , we can define an aggregate time series as a time series of means at each step:
(5) 
In order to identify the state of reaching a collective consensus of a time series, while allowing for some noise, we adopt the concept of convergence from (Chazelle, 2011).
Definition 5.7 (convergence).
Let and be time series, be a distance function, and . If for all time , then and converge toward each other in the interval . If then we say that and converge at time .
Definition 5.8 (convergence coordination set).
Given a set of time series , if all time series in converge toward , then we say that the set is an convergence coordination set.
We are finally ready to state the main connection between initiation of collective behavior and VLGranger causality.
Proposition 5.9 ().
Let be a distance function, be a set of time series, and be a set of initiators, which is an convergence coordination set converging towards in the interval . For any of length , let
If for any their similarity in the interval , then VLGrangercauses in that interval.
Proof.
Suppose , and converge toward each other in the interval , then, by definition, for all the times . By the definition of initiators, , such that , from some time . Thus, we have , s. t. , which means . Hence, we have . Since converges towards some constant line in the interval and converges towards the same line in the interval , hence , which means, by definition, that VLGrangercauses . ∎
We have now shown that a subset of time series are initiators of a pattern of collective behavior of an entire set if that subset VLGrangercauses the behavior of the set. Thus, VLGranger causality can solve the Coordination Initiator Inference Problem (Amornbunchornvej et al., 2018), which is a problem of determining whether a pattern of collective behavior was spurious or instigated by some subset of initiators and, if so, finding those initiators who initiate collective patterns that everyone follows.
6. Variablelag Transfer Entropy Causality
Transfer Entropy has been shown to be a nonlinear extension of Granger causality (Lee et al., 2012; Barnett et al., 2009). In this section, we generalize our concept of VLGranger causality to cover the transfer entropy concept. Given two time series
, and a probability function
, the Transfer Entropy from to can be defined below:(6) 
Where is a conditional entropy, are lag constants, , and . For the Shannon entropy (Shannon, 1948), the function can be defined as
(7) 
(8) 
Typically, we infer whether causes by comparing and . If , then causes . However, the fixedlag limitation still happens in the transfer entropy concept; in Eq. 6, we still compare with and and no variable lags are allowed. Therefore, we formalize the Variablelag Transfer Entropy or VLTransfer entropy function as below:
(9) 
Where for a given , and .
Proposition 6.1 ().
Let and be time series and be their alignment sequence. If , then .
Hence, Variablelag Transfer Entropy function generalizes the transfer entropy function. To find an appropriate , we can use in Eq. 4 that is a result of alignment of time series along with .
7. VLGranger and Transfer Entropy Causality Inference
7.1. VLGranger Causality Inference
Given a target time series , a candidate causing time series , a threshold , a significance level , and the max lag , our framework evaluates whether VLGranger causes (with a variable lag), Granger causes (with a fixed lag) or no conclusion of causation between and .
In Algorithm LABEL:algo:MainFunc line 12, we have a fixlag parameter that controls whether we choose to compute the normal Granger causality () or VLGranger causality (). We present the high level logic of the algorithm. However, the actual implementation is more efficient by removing the redundancies of the presented logic.
First, we compute Granger causality (line 1 in Algorithm LABEL:algo:MainFunc). The flag if Grangercauses , otherwise . Second, we compute VLGranger causality (line 2 in Algorithm LABEL:algo:MainFunc). The flag if VLGrangercauses , otherwise .
Based on the work in (Atukeren et al., 2010), we use the Bayesian Information Criteria (BIC) to compare the residual of regressing on past information, , with the residual of regressing on and past information . We use to represent that is less than with statistical significance by using some statistical test(s). If , then we conclude that the prediction of using past information is better than the prediction of using past information alone. For this work, to determine , we use Bayesian Information Criterion difference ratio (see Section 7.4).
After we got the results of both and , then we proceed to report the conclusion of causal relation between and w.r.t. the following four conditions.

If both and are true, then we compare the residual of variablelag regression with both and . If , then we conclude that causes with variable lags, otherwise, causes with a fix lag (line 4 in Algorithm LABEL:algo:MainFunc).

If is true but is false, then we conclude that causes with a fix lag (line 5 in Algorithm LABEL:algo:MainFunc).

If is false but is true, then we conclude that causes with variable lags (line 6 in Algorithm LABEL:algo:MainFunc).

If both and are false, then we cannot conclude whether causes (line 7 in Algorithm LABEL:algo:MainFunc).
algocf[htbp]
Note that we assume the maximum lag value is given as an input, as it is for all definitions of Granger causality. For practical purposes, a value of a large fraction (e.g., half) of the length of the time series can be used. However, there is, of course, a computational tradeoff between the magnitude of and the time it takes to compute Granger causality by almost all methods.
In the next section, we describe the details of VLGranger function that we use in Algorithm LABEL:algo:MainFunc: line 12.
7.2. VLGranger causality operationalization
Given time series , a threshold , a significance level , the maximum possible lag , and whether we want to check for variable or fixed lag , Algorithm LABEL:algo:VLGrangerCalFunc reports whether causes by setting to true or false, and by reporting on two residuals and .
First, we compute the residual of regressing of on ’s information in the past (line 1). Then, we regress on and past information to compute the residual (line 2). If , then Grangercauses and we set (line 7). If is true, then we report the result of typical Granger causality. Otherwise, we consider VLGranger causality (lines 35) by computing the emulation relation between and where is a version of that is reconstructed through DTW and is most similar to , captured by which we explain in Section 7.3.
Afterwards, we do the regression of on ’s past information to compute residual (line 4). Finally, we check whether (line 69) (see Section 7.4). If so, VLGrangercause . In the next section, we describe the details of how to construct and how to estimate the emulation similarity value .
algocf[htbp]
7.3. Dynamic Time Warping for inferring VLGranger causality.
In this work, we propose to use Dynamic Time Warping (DTW) (Sakoe and Chiba, 1978), which is a standard distance measure between two time series. DTW calculates the distance between two time series by aligning sufficiently similar patterns between two time series, while allowing for local stretching (see Figure 1). Thus, it is particularly well suited for calculating the variable lag alignment.
Given time series and , Algorithm LABEL:algo:DTWReFunc reports reconstructed time series based on that is most similar to , as well as the emulation similarity between the two series. First, we use to find the optimal alignment sequence between and , as defined in Definition 5.1. Efficient algorithms for computing exist and they can incorporate various kernels between points (Mueen and Keogh, 2016; Sakoe and Chiba, 1978). Then, we use to construct where . However, we also use crosscorrelation to normalize since DTW is sensitive to a noise of alignment (Algorithm LABEL:algo:DTWReFunc line 35).
Afterwards, we use to predict instead of using only information in the past in order to infer a VLGranger causal relation in Definition 5.2. The benefit of using DTW is that it can match time points of and with nonfixed lags (see Figure 1). Let be the DTW optimal warping path of such that for any , is most similar to .
In addition to finding , estimates the emulation similarity between in line 3. For that, we adopt the measure from (Amornbunchornvej et al., 2018) below:
(10) 
where if , if , otherwise zero. Since the represents whether is similar to in the past () or is similar to in the past (), by comparing the sign of , we can infer whether emulates . The function computes the average sign of for the entire time series. If is positive, then, on average, the number of times that is similar to in the past is greater than the number of times that is similar to some values of in the past. Hence, can be used as a proxy to determine whether emulates or vice versa. We use dtw R package (Giorgino et al., 2009) for our DTW function.
algocf[htbp]
7.4. Bayesian Information Criterion difference ratio for VLGranger causality
Given is a restricted residual sum of squares from a regression of on past, and is a length of time series, the BIC of null model can be defined below.
(11) 
For unrestricted model, given is an unrestricted residual sum of squares from a regression of on past, and is a length of time series, the BIC of alternative model can be defined below.
(12) 
We use the Bayesian Information Criterion difference ratio as a main criteria to determine whether Grangercauses or determining in Algorithm LABEL:algo:VLGrangerCalFunc line 6, which can be defined below:
(13) 
The ratio is within . The closer to , the better the performance of alternative model is compared to the null model. We can set the threshold to determine whether Grangercauses , i.e. implies Grangercauses . Other options of determining Grangercauses is to use Ftest or the emulation similarity .
7.5. VLTransferEntropy Causality Inference
Given a target time series , a candidate causing time series , and the max lag , our framework evaluates whether VLTransferEntropy causes (with a variable lag), TransferEntropycauses (with a fixed lag) or no conclusion of causation between and .
In Algorithm LABEL:algo:TransferEntropyMainFunc line 12, we have a fixlag parameter that controls whether we choose to compute the normal TransferEntropy causality () or VLTransferEntropy causality ().
algocf[htbp]
First, we compute Transfer Entropy causality (line 1 in Algorithm LABEL:algo:TransferEntropyMainFunc). The flag if TransferEntropycauses , otherwise . Second, we compute VLTransferEntropy causality (line 2 in Algorithm LABEL:algo:TransferEntropyMainFunc). The flag if VLTransferEntropycauses , otherwise .
To determine whether TransferEntropycauses , we can use the Transfer Entropy Ratio.
(14) 
The VLTransfer Entropy Ratio is defined below:
(15) 
The value is greater than one implies that TransferEntropycauses . The higher implies the higher strength of causes . The same is true for .
After we got the results of both and , then we proceed to report the conclusion of causal relation between and w.r.t. the following four conditions.

If both and are true, then we compare the with . If , then we conclude that causes with variable lags, otherwise, causes with a fix lag (line 4 in Algorithm LABEL:algo:TransferEntropyMainFunc).

If is true but is false, then we conclude that causes with a fix lag (line 5 in Algorithm LABEL:algo:TransferEntropyMainFunc).

If is false but is true, then we conclude that causes with variable lags (line 6 in Algorithm LABEL:algo:TransferEntropyMainFunc).

If both and are false, then we cannot conclude whether causes (line 7 in Algorithm LABEL:algo:TransferEntropyMainFunc).
7.6. VLTransferEntropy causality operationalization
Given time series , and the maximum possible lag , and whether we want to check for variable or fixed lag , Algorithm LABEL:algo:VLTransferEFunc reports whether causes by setting to true or false, and by reporting on two transfer entropy values: and .
First, if is true, then we compute the transfer entropy (line 1) using RTransferEntropy() (Behrendt et al., 2019). If is false, then, we reconstructed using in Section 7.3 (line 2). We compute the VLtransfer entropy (line 3) using RTransferEntropy().
If the ratio , then causes and we set (line 5), otherwise, (line 7).
algocf[htbp]
8. Experiments
We measured our framework performance on the task of inferring causal relations using both simulated and realworld datasets. The notations and symbols we use in this section are in Table 1.
8.1. Experimental setup
Term and notation  Description 

Length of time series.  
Threshold of BIC difference ratio in Section 7.4.  
Parameter of the maximum length of time delay  
BIC 
Bayesian Information Criterion, which is used as a proxy
to compare the residuals of regressions of two time series. 
emulates .  
Normal distribution.  
ARMA or A.  AutoRegressive Moving Average model. 
VLG 
Variablelag Granger causality with BIC difference ratio:
causes if BIC difference ratio . 
G  Granger causality (Atukeren et al., 2010) 
CG  CopulaGranger method (Liu et al., 2012) 
SIC  Spectral Independence Criterion method (Shajarisales et al., 2015) 
TE  Transfer entropy (Behrendt et al., 2019) 
VLTE  Variablelag transfer entropy 
We tested the performance of our method on synthetic datasets, where we explicitly embedded a variablelag causal relation, as well as on biological datasets in the context of the application of identifying initiators of collective behavior, and on other two realworld casual datasets.
We compared our methods, VLGranger causality (VLG) and VLTransfer entropy (VLTE), with several existing methods: Granger causality with Ftest (G) (Atukeren et al., 2010), CopulaGranger method (CG) (Liu et al., 2012), Spectral Independence Criterion method (SIC) (Shajarisales et al., 2015), and transfer entropy (TE) (Behrendt et al., 2019).
In this paper, we explore the choice of in for all methods to analyze the sensitivity of each method, where is the length of time series, and set as default unless explicitly stated otherwise.
8.2. Datasets
8.2.1. Synthetic data: pairwise level
The main purpose of the synthetic data is to generate settings that explicitly illustrate the difference between the original Granger causality, transfer entropy methods and the proposed variablelag approaches. We generated pairs of time series for which the fixedlag causality methods would fail to find a relationship but the variablelag approach would find the intended relationships.
We generated a set of synthetic time series of 200 time steps. We generated two sets of pairs of time series and . First, we generated either by drawing the value of each time step from a normal distribution with zero mean and a constant variance () or by AutoRegressive Moving Average model (ARMA or A.) with .
The first set we generated was of explicitly related pairs of time series and , where emulates with some time lag (). One way to ensure lag variability is to “turn off” the emulation for some time. For example, remains constant between 110th and 150th time steps. This makes a variablelag follower of . Figure 3 shows examples of the generated time series.
The second set of time series pairs and were generated independently and as a result have no causal relation. We used these pairs to ensure that our method does not infer spurious relations.
We set the significance level for both Ftest and independence test at . We considered there to be a causal relation only if for our method.
8.2.2. Synthetic data: group level
This experiment explores the ability of causal inference methods to retrieve multiple causes of a time series , which is generated from multiple time series . Fig. 2 shows the ground truth causal graph we used to generate simulated datasets. The edges represent causal directions from the cause time series (e.g. ) to the effect time series (e.g. ). represents the time series generated by , where and with some fixed lag . The task is to infer edges of this causal graph from the time series. We generated time series for each generator model 100 times. We set in this experiment due to the weak signal of causes when there are multiple causes of . There are also two generators for : normal distribution and ARMA model.
8.2.3. Schools of fish
We used the dataset of golden shiners (Notemigonus crysoleucas) that is publiclly available. The dataset has been collected for the study of information propagation over the visual fields of fish (StrandburgPeshkin and et al., 2013). A coordination event consists of twodimensional time series of fish movement that are recorded by video. The time series of fish movement are around 600 time steps. The number of fish in each dataset is around 70 individuals, of which 10 individuals are “informed” fish who have been trained to go to a feeding site. Trained fish lead the group to feeding sites while the rest of the fish just follow the group. We represent the dataset as a pair of aggregated time series: being the aggregated time series of the directions of trained fish and being the aggregated time series of the directions of untrained fish (see Fig. 4). The task is to infer whether (trained fish) is a cause of (the rest of the group).
8.2.4. Troop of baboons
We used another publicly available dataset of animal behavior, the movement of a troop of olive baboons (Papio anubis). The dataset consists of GPS tracking information from 26 members of a troop, recorded at 1 Hz from 6 AM to 6 PM between August 01, 2012 and August 10, 2012. The troop lives in the wild at the Mpala Research Centre, Kenya (Crofoot et al., 2015; StrandburgPeshkin et al., 2015). For the analysis, we selected the 16 members of the troop that have GPS information available for 10 consecutive days, with no missing data. We selected a set of trajectories of latlong coordinates from a highly coordinated event that has the length of 600 time steps (seconds) for each baboon. This known coordination event is on August 02, 2012 in the morning, with the baboon ID3 initiating the movement, followed by the rest of the troop (Amornbunchornvej et al., 2018). Again, the goal is to infer ID3 (time series ) as the cause of the movement of the rest of the group (aggregate time series ) (see Fig. 5).
8.2.5. Gas furnace
8.2.6. Old Faithful geyser eruption
This dataset consists of information regarding eruption durations and intervals between eruption events at Old Faithful geyser (Azzalini and Bowman, 1990). is time series of eruption duration and is time series of the interval between current eruption and the next eruption (see Fig. 7). Both have 298 time steps.
8.3. Time complexity and running time
Running time (sec)  

VLG  VLTE  
0.05  5.39  110.00  17.57  126.02 
0.10  7.90  128.19  17.42  121.38 
0.20  9.22  200.17  17.93  131.23 
The main cost of computation in our approach is DTW. We used the “Windowing technique” for the search area of warping (Keogh and Pazzani, 2001). The main parameter for windowing technique is the maximum time delay . Hence, the time complexity of VLG is . The time complexity of TE can be at most (Shao et al., 2014), which makes VLTE has the same time complexity. Table 2 shows the running time of our approach on time series with the varying length () and maximum time delay ().
9. Results
We report the results of our proposed approaches and other methods on both synthetic and realworld datasets. We also explore how the performance of the methods depends on the basic parameter, .
9.1. Synthetic data: pairwise level
VLG  G  CG  SIC  TE  VLTE  
:  1.00  1.00  0.79  0.64  0.72  0.93 
:  0.99  0.88  0.67  0.34  0.52  0.68 
A.:  0.99  1.00  0.79  0.68  0.84  0.92 
A.:  0.99  0.66  0.50  0.30  0.50  0.76 
:, A.:  0.99  0.91  0.75  0.57  0.50  0.76 
Table 3 (1st5th rows) shows the results of the accuracy of inferring causal relations and directions. For each row, we repeated the experiment 100 times on simulated datasets and computed the accuracy and reported the mean. The result shows that our methods, VLG, performed better than the rest of other methods. VLTE also performed better than TE. Moreover, we also investigate the sensitivity of varying the value of the parameter for all methods. We aggregated the accuracy of inferring causal direction from various cases that have the same value and report the result. The result in Fig. 8 shows that our approaches: VLG, can maintain the high accuracy throughout the range of the values of .
9.2. Synthetic data: group level
Table 4 shows the result of causal graph inference. The VLG performed the best overall with the highest F1 score. This result reflects the fact that our approaches can handle complicated time series in causal inference task better than the rest of other methods. VLTE also performed better than VLTE. In addition, we aggregated and the rest of time series , then we measured the ability of methods to infer that is a cause of . The results, which are in the “Group: ” column in Table 4, show that VLG, G, TE and VLTE performed well in this task, while CG and SIC failed to infer causal relations.
Causal graph  Group:  

Methods  Precision  Recall  F1 score  Accuracy 
VLG  0.88  0.91  0.89  1.00 
G  0.63  0.84  0.69  1.00 
CG  0.41  0.64  0.47  0.13 
SIC  0.16  0.60  0.25  0.33 
TE  0.21  0.79  0.34  1.00 
VLTE  0.26  0.79  0.39  1.00 
9.3. Realworld datasets
Case  VLG  G  CG  SIC  TE  VLTE 

Fish  1  0  1  0  1  1 
Baboon  1  1  1  1  1  1 
Gas furnace  1*  1  0  1  1  1 
Old faithful geyser  1*  0  1  1  0  1 
Table 5 shows results of inferring causal relations in realworld datasets. For VLG, it performed better than G. However, BIC difference ratio failed to infer causal relations of gas furnace and old faithful geyser datasets but Ftest successfully inferred causal relations in all datasets. Typically, a causal relation that has a high BIC difference ratio can also be detected to have a causal relation by Ftest but not vise versa. This suggests that gas furnace and old faithful geyser have weak causal relations. For G, the method cannot detect fish and Old faithful geyser datasets. This suggests that both datasets have a highlevel of variable lags that a fixedlag assumption in G has an issue. For CG, SIC, and TE, they failed in one dataset each. This implies that some dataset that a specific approach failed to detect a causal relation has broke some assumption of a specific approach. Lastly, VLTE was able to detect all causal relations.
For the old faithful geyser dataset, both G and TE failed to detect a causal relation while both VLG and VLTE successfully inferred a causal relation. This implies that this dataset has a highlevel of variable lags that broke a fixlag assumption of G and TE.
9.4. Variable lags vs. fixed lag
9.4.1. VLGranger causality
To compare the performance of VLG and G, we simulated 100 datasets of with variable lags. Since , a higher BIC difference ratio implies a better result. Fig. 9 shows the results of BIC difference ratio for VLG and G. Obviously, VLG has a higher BIC difference ratio than G’s. This suggests that VLG was able to capture stronger signal of causes .
9.4.2. VLTransfer Entropy
To compare the performance of VLTE and TE, we also simulated 100 datasets of with variable lags. Since , a higher transfer entropy ratio implies a better result. Fig. 10 shows the results of transfer entropy ratio for VLTE and TE. Obviously, VLTE has a higher transfer entropy ratio than TE’s. This suggests that VLTE was able to capture stronger signal of causes .
10. Conclusions
In this work, we proposed a method to infer Granger and transfer entropy causal relations in time series where the causes influence effects with arbitrary time delays, which can change dynamically. We formalized a new Granger causal relation and a new transfer entropy causal relation, proving that they are true generalizations of the traditional Granger causality and transfer entropy respectively. We demonstrated on both carefully designed synthetic datasets and noisy realworld datasets that the new causal relations can address the arbitrarytimelag influence between cause and effect, while the traditional Granger causality and transfer entropy cannot. Moreover, in addition to improving and extending Granger causality and transfer entropy, our approach can be applied to infer leaderfollower relations, as well as the dependency property between cause and effect. We have shown that, in many situations, the causal relations between time series do not have a lockstep connection of a fixed lag that the traditional Granger causality and transfer entropy assume. Hence, traditional Granger causality and transfer entropy missed true existing causal relations in such cases, while our methods correctly inferred them. Our approach can be applied in any domain of study where the causal relations between time series is of interest. The R package entitled VLTimeSeriesCausality is provided at (Amornbunchornvej, [n. d.]).
References
 (1)
 MLS ([n. d.]) [n. d.]. Granger Causality Package in MATLAB. https://www.mathworks.com/matlabcentral/fileexchange/25467grangercausalitytest.
 RSo ([n. d.]) [n. d.]. Granger Causality Package in R. https://www.rdocumentation.org/packages/MSBVAR/versions/0.92/topics/granger.test.
 Amornbunchornvej ([n. d.]) Chainarong Amornbunchornvej. [n. d.]. VLTimeSeriesCausality: R package for variablelag causal inference in time series. https://github.com/DarkEyes/VLTimeSeriesCausality. Accessed: 20191210.
 Amornbunchornvej et al. (2018) Chainarong Amornbunchornvej, Ivan Brugere, Ariana StrandburgPeshkin, Damien Farine, Margaret C Crofoot, and Tanya Y BergerWolf. 2018. Coordination Event Detection and Initiator Identification in Time Series Data. ACM Trans. Knowl. Discov. Data 12, 5, Article 53 (6 2018), 33 pages. https://doi.org/10.1145/3201406

Amornbunchornvej et al. (2019)
Chainarong Amornbunchornvej,
Elena Zheleva, and Tanya BergerWolf.
2019.
Variablelag Granger Causality for Time Series
Analysis. In
2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
. IEEE, 21–30. https://doi.org/10.1109/DSAA.2019.00016  Arnold et al. (2007) Andrew Arnold, Yan Liu, and Naoki Abe. 2007. Temporal Causal Modeling with Graphical Granger Methods. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’07). ACM, New York, NY, USA, 66–75. https://doi.org/10.1145/1281192.1281203

Atukeren
et al. (2010)
Erdal Atukeren et al.
2010.
The relationship between the Ftest and the Schwarz criterion: implications for Grangercausality tests.
Econ Bull 30, 1 (2010), 494–499.  Azzalini and Bowman (1990) Adelchi Azzalini and Adrian W Bowman. 1990. A look at some data on the Old Faithful geyser. Journal of the Royal Statistical Society: Series C (Applied Statistics) 39, 3 (1990), 357–365.
 Barnett et al. (2009) Lionel Barnett, Adam B. Barrett, and Anil K. Seth. 2009. Granger Causality and Transfer Entropy Are Equivalent for Gaussian Variables. Phys. Rev. Lett. 103 (Dec 2009), 238701. Issue 23. https://doi.org/10.1103/PhysRevLett.103.238701
 Behrendt et al. (2019) Simon Behrendt, Thomas Dimpfl, Franziska J. Peter, and David J. Zimmermann. 2019. RTransferEntropy — Quantifying information flow between different time series using effective transfer entropy. SoftwareX 10 (2019), 100265. https://doi.org/10.1016/j.softx.2019.100265
 Box et al. (2015) George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. 2015. Time series analysis: forecasting and control. John Wiley & Sons.
 Chazelle (2011) Bernard Chazelle. 2011. The Total sEnergy of a Multiagent System. SIAM Journal on Control and Optimization 49, 4 (2011), 1680–1706. https://doi.org/10.1137/100791671 arXiv:https://doi.org/10.1137/100791671
 Chen et al. (2004) Yonghong Chen, Govindan Rangarajan, Jianfeng Feng, and Mingzhou Ding. 2004. Analyzing multiple nonlinear time series with extended Granger causality. Physics Letters A 324, 1 (2004), 26–35.
 Crofoot et al. (2015) Margaret C Crofoot, Roland W Kays, and Martin Wikelski. 2015. Data from: Shared decisionmaking drives collective movement in wild baboons.
 Eichler (2013a) Michael Eichler. 2013a. Causal inference with multiple time series: principles and problems. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371, 1997 (2013), 20110613. https://doi.org/10.1098/rsta.2011.0613
 Eichler (2013b) Michael Eichler. 2013b. Causal inference with multiple time series: principles and problems. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371, 1997 (2013), 20110613.
 Giorgino et al. (2009) Toni Giorgino et al. 2009. Computing and visualizing dynamic time warping alignments in R: the dtw package. Journal of statistical Software 31, 7 (2009), 1–24.
 Granger (1969) Clive WJ Granger. 1969. Investigating causal relations by econometric models and crossspectral methods. Econometrica: Journal of the Econometric Society (1969), 424–438.
 GriveauBillion and Calderhead (2019) Théophile GriveauBillion and Ben Calderhead. 2019. Efficient structure learning with automatic sparsity selection for causal graph processes. arXiv preprint arXiv:1906.04479 (2019).
 Iseki et al. (2019) Akane Iseki, Y. Mukuta, Y. Ushiki, and T. Harada. 2019. Estimating the causal effect from partially observed time series. In AAAI.
 Janzing and Scholkopf (2010) Dominik Janzing and Bernhard Scholkopf. 2010. Causal inference using the algorithmic Markov condition. IEEE Transactions on Information Theory 56, 10 (2010), 5168–5194.
 Keogh and Pazzani (2001) Eamonn J Keogh and Michael J Pazzani. 2001. Derivative dynamic time warping. In Proceedings of the 2001 SIAM international conference on data mining. SIAM, 1–11.
 Lee et al. (2012) Joon Lee, Shamim Nemati, Ikaro Silva, Bradley A Edwards, James P Butler, and Atul Malhotra. 2012. Transfer entropy estimation and directional coupling change detection in biomedical time series. Biomedical engineering online 11, 1 (2012), 19.
 Liu et al. (2012) Yan Liu, Taha Bahadori, and Hongfei Li. 2012. Sparsegev: Sparse latent space model for multivariate extreme value time serie modeling. In ICML.
 Malinsky and Spirtes (2018) Daniel Malinsky and Peter Spirtes. 2018. Causal structure learning from multivariate time series in settings with unmeasured confounding. In Proceedings of 2018 ACM SIGKDD Workshop on Causal Discovery. 23–47.
 Mueen and Keogh (2016) Abdullah Mueen and Eamonn Keogh. 2016. Extracting Optimal Performance from Dynamic Time Warping. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). ACM, New York, NY, USA, 2129–2130. https://doi.org/10.1145/2939672.2945383
 Pearl (2000) J Pearl. 2000. Causality: Models, reasoning and inference Cambridge University Press. Cambridge, MA, USA, 9 (2000).
 Peng et al. (2007) Wei Peng, Tong Sun, Philip Rose, and Tao Li. 2007. A Semiautomatic System with an Iterative Learning Method for Discovering the Leading Indicators in Business Processes. In Proceedings of the 2007 International Workshop on Domain Driven Data Mining (DDDM ’07). ACM, New York, NY, USA, 33–42. https://doi.org/10.1145/1288552.1288557
 Peters et al. (2013) Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2013. Causal inference on time series using restricted structural equation models. In Advances in Neural Information Processing Systems. 154–162.
 Quinn et al. (2015) C. J. Quinn, N. Kiyavash, and T. P. Coleman. 2015. Directed Information Graphs. IEEE Transactions on Information Theory 61, 12 (Dec 2015), 6887–6909. https://doi.org/10.1109/TIT.2015.2478440
 Sakoe and Chiba (1978) Hiroaki Sakoe and Seibi Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE transactions on acoustics, speech, and signal processing 26, 1 (1978), 43–49.
 Schölkopf et al. (2012) Bernhard Schölkopf, Dominik Janzing, Jonas Peters, Eleni Sgouritsa, Kun Zhang, and Joris Mooij. 2012. On causal and anticausal learning. In ICML.
 Schreiber (2000) Thomas Schreiber. 2000. Measuring information transfer. Physical review letters 85, 2 (2000), 461.

Schwab
et al. (2019)
Patrick Schwab, Djordje
Miladinovic, and Walter Karlen.
2019.
Grangercausal attentive Mixtures of Experts: Learning Important Features with Neural Networks. In
AAAI.  Shajarisales et al. (2015) Naji Shajarisales, Dominik Janzing, Bernhard Schölkopf, and Michel Besserve. 2015. Telling cause from effect in deterministic linear dynamical systems. In ICML. 285–294.
 Shannon (1948) C. E. Shannon. 1948. A mathematical theory of communication. The Bell System Technical Journal 27, 3 (July 1948), 379–423. https://doi.org/10.1002/j.15387305.1948.tb01338.x
 Shao et al. (2014) Shengjia Shao, Ce Guo, Wayne Luk, and Stephen Weston. 2014. Accelerating transfer entropy computation. In 2014 International Conference on FieldProgrammable Technology (FPT). IEEE, 60–67.
 Shibuya et al. (2009) Takashi Shibuya, Tatsuya Harada, and Yasuo Kuniyoshi. 2009. Causality quantification and its applications: structuring and modeling of multivariate time series. In KDD. ACM.

Siggiridou and
Kugiumtzis (2016)
Elsa Siggiridou and
Dimitris Kugiumtzis. 2016.
Granger causality in multivariate time series using a timeordered restricted vector autoregressive model.
IEEE Transactions on Signal Processing 64, 7 (2016), 1759–1773.  Sliva et al. (2015) Amy Sliva, Scott Neal Reilly, Randy Casstevens, and John Chamberlain. 2015. Tools for validating causal and predictive claims in social science models. Procedia Manufacturing 3 (2015), 3925–3932.
 Spirtes et al. (1993) Peter Spirtes, Clark Glymour, and Richard Scheines. 1993. Discovery Algorithms for Causally Sufficient Structures. Springer New York, New York, NY, 103–162. https://doi.org/10.1007/9781461227489_5
 StrandburgPeshkin and et al. (2013) A. StrandburgPeshkin and et al. 2013. Visual sensory networks and effective information transfer in animal groups. Current Biology 23, 17 (2013), R709–R711.
 StrandburgPeshkin et al. (2015) Ariana StrandburgPeshkin, Damien R Farine, Iain D Couzin, and Margaret C Crofoot. 2015. Shared decisionmaking drives collective movement in wild baboons. Science 348, 6241 (2015), 1358–1361.
 Sun et al. (2015) Youqiang Sun, Jiuyong Li, Jixue Liu, Christopher Chow, Bingyu Sun, and Rujing Wang. 2015. Using causal discovery for feature selection in multivariate numerical time series. Machine Learning 101, 1 (01 Oct 2015), 377–395. https://doi.org/10.1007/s1099401454601
 Varian (2016) Hal R. Varian. 2016. Causal inference in economics and marketing. Proceedings of the National Academy of Sciences 113, 27 (2016), 7310–7315. https://doi.org/10.1073/pnas.1510479113 arXiv:https://www.pnas.org/content/113/27/7310.full.pdf
 Yuan et al. (2016) Tao Yuan, Gang Li, Zhaohui Zhang, and S Joe Qin. 2016. Deep causal mining for plantwide oscillations with multilevel Granger causality analysis. In American Control Conference (ACC), 2016. IEEE, 5056–5061.
Comments
There are no comments yet.