Causal Inference in Non-linear Time-series using Deep Networks and Knockoff Counterfactuals

Estimating causal relations is vital in understanding the complex interactions in multivariate time series. Non-linear coupling of variables is one of the major challenges inaccurate estimation of cause-effect relations. In this paper, we propose to use deep autoregressive networks (DeepAR) in tandem with counterfactual analysis to infer nonlinear causal relations in multivariate time series. We extend the concept of Granger causality using probabilistic forecasting with DeepAR. Since deep networks can neither handle missing input nor out-of-distribution intervention, we propose to use the Knockoffs framework (Barberand Cand`es, 2015) for generating intervention variables and consequently counterfactual probabilistic forecasting. Knockoff samples are independent of their output given the observed variables and exchangeable with their counterpart variables without changing the underlying distribution of the data. We test our method on synthetic as well as real-world time series datasets. Overall our method outperforms the widely used vector autoregressive Granger causality and PCMCI in detecting nonlinear causal dependency in multivariate time series.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

07/21/2012

Causal Inference on Time Series using Structural Equation Models

Causal inference uses observations to infer the causal structure of the ...
11/14/2021

Decoding Causality by Fictitious VAR Modeling

In modeling multivariate time series for either forecast or policy analy...
04/10/2020

Seq2VAR: Multivariate Time Series Representation with Relational Neural Networks and Linear Autoregressive Model

Finding understandable and meaningful feature representation of multivar...
03/13/2013

A computational scheme for Reasoning in Dynamic Probabilistic Networks

A computational scheme for reasoning about dynamic systems using (causal...
11/11/2015

Granger Causality in Multi-variate Time Series using a Time Ordered Restricted Vector Autoregressive Model

Granger causality has been used for the investigation of the inter-depen...
12/10/2017

Causal Inference for Observational Time-Series with Encoder-Decoder Networks

This paper proposes a method for estimating the causal effect of a discr...
01/07/2020

Discovering Nonlinear Relations with Minimum Predictive Information Regularization

Identifying the underlying directional relations from observational time...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Improving our understanding of cause-effect relationships when analyzing real-world complex systems, e.g., economy, climate- and neuro-sciences, helps in addressing questions like what factors affect our health, the economy, climate and which actions need to be taken to mitigate the adverse effects and enhance beneficial effects? One of the main challenges in inferring cause-effect relations in such systems is the non-linear dependency among variables, which often leads to inaccurate causal discovery. To mitigate this issue, we propose to use deep recurrent networks in combination with counterfactual analysis to infer nonlinear causal relations in multivariate time series. We apply the concept of Granger causality (GC) [36] using probabilistic forecasting with deep autoregressive (DeepAR) networks [6]. GC makes use of the following two properties: (i) Temporal precedence: a cause is followed by its effect and (ii) Physical influence: changing cause alters the effects. It estimates the extent by which a variable helps in the prediction of another variable . The impact of the inclusion of in the model is estimated by comparing the prediction of with and without . Due to its practical implementation, GC has been intensively used in various fields i.e. neuroscience, climatology, finance, etc., for discovering causal graphs in time series [7, 8, 5, 25].

A major shortcoming of Granger causality however is in its assumption of linear dependency in the observational data. Deep networks excel in learning complex and nonlinear relations. In this paper, we use DeepAR networks [6] to model non-linear relations in multivariate time series. DeepAR characterizes conditioning on observed variables and extracts hidden features from the time series which helps in dealing with spurious associations that might occur due to confounders such as periodic cycles and trends. However, deep networks can not handle missing input or out-of-distribution data. To deal with this issue, we propose to use the theoretically well-established Knockoffs framework [35] to generate intervention variables. The Knockoffs framework was originally developed as a variable selection tool with a controllable false discovery rate in a prediction model. Knockoff samples are statistically null-variables, i.e., independent of the model output, and can be swapped with the original variables without changing the underlying distribution of the data. The in-distribution nature of the knockoffs is important when used to generate counterfactuals for causal discovery because the trained deep networks expect test data to be within the same distribution as training data. Moreover, knockoff samples control false discovery rate in causal inference as it holds as low correlation with the candidate variable as possible.

Causal hierarchy operates in three layers (i) Association (ii) Intervention and (iii) Counterfactual [40]. We use counterfactuals where questions like what if certain thing has been done differently? Or is it certain action that causes the change? are addressed. A counterfactual outcome describes a cause-effect scenario such that if event had not occurred, event

would not have happened. Counterfactuals have been used for feature selection, time series explanation, and model interpretability

[21, 19]. We generate counterfactual output by substituting one of the observed variables at a time with its knockoff samples and estimate its causal impact by measuring its effect on the forecast error assuming that the used multivariate time series are stationary and there exist no hidden confounders. We compare the causal inference performance of the proposed method with the widely used vector autoregressive Granger causality (VAR-GC) and PCMCI [18] using synthetic as well as real river discharges datasets. We show that our method outperforms these methods in estimating nonlinear causal links. To highlight the advantage of using knockoffs based counterfactuals, we carry out a comparison using different intervention time series, i.e., distribution-mean, out-of-distribution samples, and knockoffs.

The remainder of this paper is organized as follows. In Section II of the paper, we cover the related work. Methodological background are presented in Section III. In Section IV, we describe in detail the proposed causal inference method. Experimental results are presented and discussed in Section V. The work is concluded in Section VI of the paper.

Ii Related Work

Several methods in the literature have addressed the challenge of causal inference in nonlinear multivariate time series. The method of [14]

uses structured multilayer perceptrons and recurrent neural networks combined with sparsity-inducing penalties on the weights. It monitors the weights assigned to each time series when predicting the target time series. The authors of

[32]

also proposed a similar solution for causal discovery using deep learning where the architecture of the neural network is partially utilized from data to infer causality among variables. They chose to interpret the weights of a neural network under some constraints. The method trains as many neural networks models as the number of variables in the dataset, where one variable is the target and the rest of the variables are inputs to predict the target. Analyzing the weights of the network reveals whether there is a relationship between a variable and the target: if any of the paths between a variable and the target has a weight value as 0, the target does not have a causal link with that particular variable. However, it is challenging to extract any clear structure of the data from the deep networks that can be used for interpretability

[39]

. Attention-based convolutional neural networks have also been used for causal discovery, where they utilize the internal parameters of the networks to find out time delay in the causal relationship between variables

[15].

The PCMCI [18] method estimates causality in multivariate time series using linear and non-linear conditional independent tests. PCMCI works in two steps, first, it determines the causal parents of each variable and then applies conditional independence test to estimate causality between variables. PCMCI has been implemented in [16] to estimate causal networks in biosphere–atmosphere interaction. PCMCI also used along with dimensionality reduction to infer causality in ecological time series data [17] where conditional independence tests are applied on the reduced feature space for causal inference.

Cause-effect variational autoencoders (CEVAE)

[33] estimate non-linear causal links in the presence of hidden confounders by training two separate networks, with and without intervention. This method however estimates the causal link of two variables only. The extended version of CEVAE to time series has been developed by integrating the domain-specific knowledge for inferring nonlinear causality in ecological time series [34]. Additive models are also used to estimate non-linear interactions in time series, where the past of each series may have an additive non-linear effect that decouples across time series [26], however, it may overlook important non-linear relation between variables and therefore may fail to properly identify Granger causal links [14]. Counterfactuals are used by [19]

to estimate feature importance. They substitute features to obtain counterfactuals and estimate feature importance by finding their impact on the output. Unlike our approach, it does not estimate causal relations in time series and does not use knockoff in-distribution samples to reduce the false discovery rate. Knockoffs have been used to generate counterfactuals to explain the decision of deep image classifiers

[29]. To deal with spurious causal links that may results from periodic patterns and trends in time series, the method in [37, 38] proposed to use time frequency causality analysis. However, these methods use the spectral representation of linear VAR-GC and hence may not be able to deal with nonlinear causal relations.

Iii Methodological Background

In this section we will briefly introduce the necessary background on DeepAR [6], the fundamental concept of Knockoffs framework [22], as well as VAR-GC.

Iii-a DeepAR

DeepAR [6] is a powerful method mainly used for multivariate non-linear non-stationary time series forecasting. It is based on autoregressive recurrent neural networks, which learn a global model from history of all the related time-series in the dataset. DeepAR is based on previous work on deep learning for time series data [20] and uses a similar LSTM-based recurrent neural network architecture to the probabilistic forecasting problem. Few key attributes of DeepAR compared to classical approaches and other global forecasting methods are: It extracts hidden features in the form of seasonal patterns and trends in the data, and characterizes conditioning on observed variables and extracted features during inference. Moreover, in this probabilistic forecasting method, lesser hand-crafted feature engineering is needed to capture complex, group-dependent behavior.

The training and forecast processes of the DeepAR is shown in Fig. 1. Let be the variate time series. Each time series is a realization of length real-valued discrete stochastic process . At each time step , the inputs to the network are the observed target value at the previous time step as well as the previous network output . The network output then used to compute the parameters of the likelihood , which is used for training the model parameters. For prediction, the past of the time series is fed in for , where represents the first value in the forecast horizon, then in the forecast horizon for a sample is drawn and fed back for the next point until the end of the prediction range generating one sample trace. Repeating this prediction process yields many traces representing the joint predicted distribution.

Fig. 1: Architecture of DeepAR network showing training (left) and forecast (right) phases [6].

Iii-B Knockoffs

The Knockoffs framework was developed by Barber and Candès in 2015 as a tool for estimating feature importance using conditional independence testing with controllable false discovery rate [23]. Given a set of observed variables , a target variable , and a predictive model, the knockoffs of the observed variables, defined as , are constructed to be in-distribution null-variables and as decorrelated as possible from the original data. They can be swapped with the original variables without changing the underlying distribution of the data. This is accomplished by ensuring that the correlation between the generated knockoffs is the same as the correlation between the original variables [23]. To be null-variables, knockoffs do not contain any information about the target variable . These conditions can be formally written as follows.

  • Exchangeability: , where is obtained by swapping the features () with their knockoffs .

  • Null variables: , which is guaranteed if is generated without looking at the target variable .

Iii-C VAR Granger Causality

GC is based on the idea that cause precedes its effects and can help in their prediction. Here we briefly describe how VAR is used to infer GC. Let be the time series of variables. Each time series is a realization of length real-valued discrete stationary stochastic process . These time-series can be represented by a order VAR of the form

(1)

The residuals

form a white noise stationary process with covariance matrix

. The model parameters at time lags comprise the matrix . Let be the covariance matrix of the residual associated to using the model in (1), and let denote the covariance matrix of this residual after missing out the row and column in . The VAR-GC of on conditioned on all other variables is defined by [27].

(2)

Iv Causal Effect Estimation using DeepAR and Knockoffs Counterfactuals

Let be the variate time series. Each time series is a realization of length real-valued discrete stochastic process . Throughout our study, we make use of the following two assumptions.

  • Stationarity: is a stationary stochastic process.

  • Causal Sufficiency: The set of observed variables includes all of the common causes of pairs in .

We train and optimize DeepAR by providing multivariate time series as input. For each realization of the time series , we calculate forecast error as mean absolute percentage error (MAPE) using the trained model.

(3)

Where is the actual value of time series at time step , and

is the predicted value which is defined by the mean of the probability distribution

at time step t in the forecast horizon. To know the cause-effect relationship of on , we intervene on variable with its knockoffs , and notice the influence on the prediction of a target variable by comparing the obtained counterfactual output with the actual output. In analogy to the definition of VAR-GC of on in (2), we define our metric for estimating the non-linear Granger causality of of on conditioned on all other variables by the causal significance score (CSS) as follows.

(4)

where MAPE and MAPE denote the mean absolute percentage error when predicting before and after intervention on . In this work, we investigate multiple ways to generate counterfactuals, i.e. distribution mean, out-of-distribution, and knockoff samples as described below. However, we mainly emphasize the use of knockoffs to generate counterfactuals for causal discovery in time series.

  • Distribution mean: Here we replace each value of the time series with its mean .

  • Out-of-distribution: In this type of intervention, the candidate variable that belongs to data distribution is replaced with a variable from another data distribution such that . Moreover, is selected to be as uncorrelated with the original variable as possible.

  • Deep Knockoffs: We obtain knockoffs for our time series data using DeepKnockoffs [22], a framework for sampling approximate knockoffs using deep generative models. The knockoff samples are generated using a mixture of Gaussian models as implemented in [30].

We calculate CSS using (4) by incorporating MAPE and MAPE for each realization of the stochastic process before and after intervention on , . Since we have multiple realizations for each time series, as a result, we get a distribution of values of CSS for each pair of time series and

. The null hypothesis

: does not Granger cause , is accepted if the mean of the distribution of CSS is close to zero. In case the mean of the distribution is significantly different from zero, the alternate hypothesis : Granger causes is accepted. The causal relationship for each pair of time series is estimated iteratively and ultimately the full causal graph of the multivariate time series is extracted.

V Experiments

We conducted experiments on synthetic as well as a real datasets. Experiments are performed on multiple realizations of the time series. For synthetic data, we set as the length of each realization

of the time series. The prediction length T is set to 14. We trained our model for 150 epochs. The number of layers and cells in each layer of the network is set to 4 and 40 respectively while dropout is

. We estimate causal link among time series by hypothesis testing on the distribution of CSS as explained in Section IV

. To quantify the performance of our method, we calculate the two metrics: false positive rate (FPR) and F-score where

FPR (5)
F-score (6)

Here, TP is the number of causal links that exist in the causal graph and also detected by the method; TN is the number of causal links that neither exist in the causal graph nor detected by the method; FP is the number of causal links that do not exist in the causal graph but detected by the method and FN is the number of the causal links that exist in the causal graph but not detected by the method.

V-a Synthetic data

To test the proposed method on synthetic data we generate time series , with seasonal pattern and non-linear relationships among them as follows. This synthetic model simulate a test model of climate-ecosystem interactions as suggested in [16], where denotes global radiation R, represents air temperature T while and denote gross primary production GPP and ecosystem respiration respectively.

(7)

The coupling coefficients and time lags are represented as , , , , and , , . . . , respectively. The notation

, termed as intrinsic or dynamical noise, which represents data from uncorrelated, normally distributed noise. We set

as Gaussian with 0 mean and variances [0.30, 0.35, 0.25]. The value for the coupling coefficients

to and were set to [0.95, 0.80, 0.50, 0.75] and [0.2, 0.4, … , 1.0], respectively. The lags were given only integer values in the range [0, 10] and is set to 10 which is referred to reference temperature in the test data model used in [16]. The seasonal part of is where represents the frequency which takes the value 150 and is the time that varies from 0 to 3000. We keep changing , the coefficient of non-linear relationship between and , in the test model every simulation while rest of the parameters are kept intact. We are interested to see the consistency and causal estimation capability of our method in comparison with PCMCI and VAR-GC when is increased.

V-B Real data

As a real-world application of our method, we carry out causal analysis of average daily discharges of rivers in the upper Danube basin, provided by the Bavarian Environmental Agency 111https://www.gkd.bayern.de. We use measurements of three years (2017-2019) from the Iller at Kempten , the Danube at Dillingen , and the Isar at Lenggries as considered by [24]. While the Iller flows into the Danube upstream of Dillingen with the water from Kempten reaching Dillingen within a day approximately, the Isar reaches the Danube downstream of Dillingen. Based on the stated scenario, we expect a contemporaneous link and no direct causal links between the pairs , and , . Since all variables may be confounded by rainfall or other weather conditions, this choice allows testing the ability of the methods to detect and distinguish directed and bidirected links.

V-C Results

For synthetic data model, we demonstrate results in Fig. 2 which shows the F-score values for PCMCI, VAR-GC, and DeepAR with all three types of substitution methods represented as DeepAR-Knockoffs, DeepAR-OutDist, DeepAR-Mean. The figure illustrates the performance of these methods in response to an increase in the coefficient of non-linearity between and . VAR-GC assumes linearity and hence yields a lower F-score as the dataset contains non-linear relations among variables in the synthetic data model. PCMCI however has better results compare to VAR-GC because of using a non-linear method, Gaussian process distance correlation (GDPC), for testing conditional independence. DeepAR-Knockoffs, yields better results in terms of F-score compare to VAR-GC and PCMCI. DeepAR-OutDist suffers from a high false discovery rate because of using out-of-distribution variables as substitution and hence yield lower F-score throughout the experiment. DeepAR-Mean performs better than DeepAR-OutDist but falls behind DeepAR-Knockoffs. The plot in Fig 3 shows FPR in response to increase in the non-linearity coefficient . It can be seen that VAR-GC yields a high FPR due to its assumption of linearity. Besides that, DeepAR-OutDist also produces a high number of false positives, because outside-distribution samples introduce bias in the model and hence results in detection of false causal links. PCMCI, DeepAR-Mean, and DeepAR-Knockoffs however have better false discovery rates.

Fig. 2: F-score of given methods in response to increase in the coefficient of non-linear relationship between and .
Fig. 3: False positive rate (FPR) of all methods for increasing between and in the synthetic data model.

For real data, the contemporaneous link is correctly detected by DeepAR-Knockoffs, VAR-GC and PCMCI as shown in Table I. Our method does not report any causal link however VAR-GC and PCMCI wrongly finds . PCMCI and VAR-GC detected a causal link and which may be because of the weather acting as a confounder. DeepAR-Knockoffs does not report a causal link between and in any direction as the method has the capability of handling confounders, such as seasonal patterns, in a better way. Our method wrongly finds a causal link , however, VAR-GC and PCMCI do not detect such a causal relationship. Due to extreme events in the river discharges data caused by heavy rainfall, the assumption of stationarity is expected to be violated, which is why VAR-GC and PCMCI report these false links. In comparison DeepAR-Knockoffs correctly discovers the expected links in the river discharges data, except a single false link.

Causal links Expected VAR-GC PCMCI DeepAR-Knockoffs
Yes Yes Yes Yes

No Yes Yes No

No Yes Yes No

No Yes No No

No Yes Yes No

No No No Yes
TABLE I: Expected and detected causal links in river discharges data by VAR-GC, PCMCI and DeepAR-Knockoffs

Vi Conclusion

We proposed a novel method for inferring cause-effect relations in non-linear multivariate time series. Since deep networks can not handle out-of-distribution intervention, we proposed to use probabilistic forecasting with DeepAR in combination of knockoffs-based counterfactuals for estimating nonlinear Granger causality. We applied our method on synthetic and real river discharges datasets. Our results confirm that using knockoff samples to generate counterfactuals yields better results in comparison to using the mean- and out-of-distribution as intervention methods. Results also indicate that the proposed method outperforms VAR-GC and PCMCI methods in estimating non-linear relations and dealing with confounders. It should be noted however that the better performance of the proposed method comes along with the higher computational load associated with using deep networks specially for large multivariate time series.

Vii Acknowledgment

This work is funded by the Carl Zeiss Foundation within the scope of the program line Breakthroughs: Exploring Intelligent Systems for Digitization — explore the basics, use applications and the DFG grant SH 1682/1-1.

References

  • [1] Pearl, Judea. Causal diagrams for empirical research. Biometrika 82.4 (1995): 669-688.
  • [2] Pearl, Judea. Causality: Models, reasoning and inference cambridge university press. Cambridge, MA, USA, 9 (2000): 10-11.
  • [3] Lauritzen, Steffen L. Causal inference from graphical models. Complex stochastic systems (2001): 63-107.
  • [4] Dawid, A. Philip. Influence diagrams for causal modelling and inference. International Statistical Review 70.2 (2002): 161-189.
  • [5] Granger, Clive WJ. Testing for causality: a personal viewpoint. Journal of Economic Dynamics and control 2 (1980): 329-352.
  • [6] Salinas, David, et al. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting 36.3 (2020): 1181-1191.
  • [7]

    Goebel, Rainer, et al. Investigating directed cortical interactions in time-resolved fMRI data using vector autoregressive modeling and Granger causality mapping. Magnetic resonance imaging 21.10 (2003): 1251-1261.

  • [8] Brovelli, Andrea, et al. Beta oscillations in a large-scale sensorimotor cortical network: directional influences revealed by Granger causality. Proceedings of the National Academy of Sciences 101.26 (2004): 9849-9854.
  • [9] Lütkepohl, Helmut. New introduction to multiple time series analysis. Springer Science & Business Media, 2005.
  • [10] Lozano, Aurelie C., Naoki Abe, Yan Liu, and Saharon Rosset. Grouped graphical Granger modeling methods for temporal causal modeling. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 577-586. 2009.
  • [11] Teräsvirta, Timo, Dag Tjøstheim, and Clive William John Granger. Modelling nonlinear economic time series. Oxford: Oxford University Press, 2010.
  • [12] Tong, Howell. Nonlinear Time Series Analysis. (2011): 955-958.
  • [13] Bose, Eliezer, Marilyn Hravnak, and Susan M. Sereika. Vector autoregressive (VAR) models and granger causality in time series analysis in nursing research: dynamic changes among vital signs prior to cardiorespiratory instability events as an example. Nursing research 66.1 (2017): 12.
  • [14] Tank, A., Covert, I., Foti, N., Shojaie, A. and Fox, E., 2018. Neural granger causality for nonlinear time series. stat, 1050, p.16.
  • [15]

    Nauta, Meike, Doina Bucur, and Christin Seifert. Causal discovery with attention-based convolutional neural networks. Machine Learning and Knowledge Extraction 1.1 (2019): 312-340.

  • [16] Krich, Christopher, Jakob Runge, Diego G. Miralles, Mirco Migliavacca, Oscar Perez-Priego, Tarek El-Madany, Arnaud Carrara, and Miguel D. Mahecha. Estimating causal networks in biosphere–atmosphere interaction with the PCMCI approach. Biogeosciences 17, no. 4 (2020): 1033-1061.
  • [17] Krich, Christopher, Mirco Migliavacca, Diego G. Miralles, Guido Kraemer, Tarek S. El-Madany, Markus Reichstein, Jakob Runge, and Miguel D. Mahecha. Functional convergence of biosphere–atmosphere interactions in response to meteorological conditions. Biogeosciences 18, no. 7 (2021): 2379-2404.
  • [18] Runge, J., Nowack, P., Kretschmer, M., Flaxman, S. and Sejdinovic, D., 2019. Detecting and quantifying causal associations in large nonlinear time series datasets. Science Advances, 5(11), p.eaau4996.
  • [19] Tonekaboni, Sana, Shalmali Joshi, David Duvenaud, and Anna Goldenberg. Explaining Time Series by Counterfactuals. (2019).
  • [20] Graves, Alex. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013).
  • [21] Delaney, Eoin, Derek Greene, and Mark T. Keane. Instance-Based Counterfactual Explanations for Time Series Classification. arXiv preprint arXiv:2009.13211 (2020).
  • [22] Romano, Yaniv, Matteo Sesia, and Emmanuel Candès. Deep knockoffs. Journal of the American Statistical Association 115.532 (2020): 1861-1872.
  • [23] Barber, Rina Foygel, and Emmanuel J. Candès. Controlling the false discovery rate via knockoffs. Annals of Statistics 43.5 (2015): 2055-2085.
  • [24] Gerhardus, Andreas, and Jakob Runge. High-recall causal discovery for autocorrelated time series with latent confounders. arXiv preprint arXiv:2007.01884 (2020).
  • [25] Granger, Clive WJ. Some recent development in a concept of causality. Journal of econometrics 39.1-2 (1988): 199-211.
  • [26] Vantas, Konstantinos, Epaminondas Sidiropoulos, and Chris Evangelides. Estimating Rainfall Erosivity from Daily Precipitation Using Generalized Additive Models. Environmental Sciences Proceedings. Vol. 2. No. 1. Multidisciplinary Digital Publishing Institute, 2020.
  • [27] Geweke, John. Measurement of linear dependence and feedback between multiple time series. Journal of the American statistical association 77.378 (1982): 304-313.
  • [28] Lawrence, Andrew R., Marcus Kaiser, Rui Sampaio, and Maksim Sipos. Data generating process to evaluate causal discovery techniques for time series data. arXiv preprint arXiv:2104.08043 (2021).
  • [29] Popescu, Oana-Iuliana, Maha Shadaydeh, and Joachim Denzler. Counterfactual Generation with Knockoffs. arXiv preprint arXiv:2102.00951 (2021).
  • [30]

    Gimenez, Jaime Roquero, Amirata Ghorbani, and James Zou. Knockoffs for the mass: new feature importance statistics with false discovery guarantees. The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2019.

  • [31] Zheng, X., Aragam, B., Ravikumar, P. K. & Xing, E. P. in Advances in Neural Information Processing Systems Vol. 31 (eds. Bengio, S. et al.) 9472–9483 (Curran Associates, 2018).
  • [32] Lachapelle, S., Brouillard, P., Deleu, T. & Lacoste-Julien, S. in Proc. Eighth Int. Conf. Learning Representations (ICLR, 2020).
  • [33] Louizos C, Shalit U, Mooij J, Sontag D, Zemel R, Welling M. Causal effect inference with deep latent-variable models. arXiv preprint arXiv:1705.08821. 2017 May 24.
  • [34]

    Trifunov, Violeta Teodora, Maha Shadaydeh, Jakob Runge, Veronika Eyring, Markus Reichstein, and Joachim Denzler. ”Nonlinear causal link estimation under hidden confounding with an application to time series anomaly detection.” In German Conference on Pattern Recognition, pp. 261-273. Springer, Cham, 2019.

  • [35] Barber, R.F., Candès, E.J. and Samworth, R.J., 2020. Robust inference with knockoffs. The Annals of Statistics, 48(3), pp.1409-1431.
  • [36] Granger, C.W., 1969. Some recent developments in a concept of causality, Journal of Econometrics, vol. 39, 1988, pp. 199—211; and. Investigating causal relations by econometric models and cross-spectral methods, Econometrica, 37, pp.424-38.
  • [37] Shadaydeh, Maha, Yanira Guanche Garcia, Miguel D. Mahecha, Markus Reichstein, and Joachim Denzler. ”Causality analysis of ecological time series: a time-frequency approach.” In 8th International Workshop on Climate Informatics: CI 2018, pp. 111-114. 2018.
  • [38] Shadaydeh, Maha, Joachim Denzler, Yanira Guanche García, and Miguel Mahecha. ”Time-frequency causal inference uncovers anomalous events in environmental systems.” In German Conference on Pattern Recognition, pp. 499-512. Springer, Cham, 2019.
  • [39] Luo Y, Peng J, Ma J. When causal inference meets deep learning. Nature Machine Intelligence. 2020 Aug;2(8):426-7.
  • [40] Pearl, J. and Mackenzie, D.. The Book of Why. Harlow, England: Penguin Books, 2019.