Causal and anti-causal learning in pattern recognition for neuroimaging

12/15/2015 ∙ by Sebastian Weichwald, et al. ∙ Max Planck Society Universitätsklinikum Freiburg 0

Pattern recognition in neuroimaging distinguishes between two types of models: encoding- and decoding models. This distinction is based on the insight that brain state features, that are found to be relevant in an experimental paradigm, carry a different meaning in encoding- than in decoding models. In this paper, we argue that this distinction is not sufficient: Relevant features in encoding- and decoding models carry a different meaning depending on whether they represent causal- or anti-causal relations. We provide a theoretical justification for this argument and conclude that causal inference is essential for interpretation in neuroimaging.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Pattern recognition in neuroimaging aims to provide insights into the neural basis of cognitive processes. Two types of models are used in this endeavor: encoding- and decoding models. Encoding models predict a subject’s brain state for a given experimental condition, while decoding models aim to reconstruct experimental conditions from neuroimaging data. This difference has important consequences for the interpretation of brain state features that are found to be relevant in each type of model.

It has been argued that only encoding models can provide a complete functional description of a region of interest [1]. Decoding models, on the other hand, may determine brain state features as relevant that are statistically independent of the experimental condition [2]. While in linear decoding models potential misinterpretations can be avoided by converting them into encoding models [3], this is a substantially more difficult problem for non-linear decoding models. As decoding models are becoming ever more popular in the analysis of neuroimaging data [4], the correct interpretation of such models is of considerable importance.

In this paper, we argue that the distinction between encoding- and decoding models is not sufficient to determine the meaning of relevant features in each type of model: Pattern recognition models need to be further distinguished with respect to whether they learn causal- or anti-causal relations [5]. In general, neuroimaging studies are based on the following causal structure: stimulus brain activity response. We note that more complex experimental paradigms, in which responses again act as stimuli [6], can also be modeled in this way by considering time-resolved variables, e. g. stimulus brain activity response. Depending on whether experimental conditions are chosen to represent stimuli or responses, encoding- and decoding models then model causal- or anti-causal relations. In the following, we argue that this has important consequences for the interpretation of relevant features in each type of model. Furthermore, we argue that interpretation of neuroimaging data de facto requires causal inference problems to be solved.

The remainder of this article is organized as follows. In section II we introduce the necessary notation and terminology to formulate our proposed distinction of pattern recognition models in section II-D. Next, we theoretically investigate the interpretability of relevant features in each type of pattern recognition model (sections III-A to III-D) and briefly summarize our findings in section III-E. In section IV we argue that interpreting encoding- and decoding models is only a first step towards solving causal inference problems in the interpretation of neuroimaging data. We close with a conclusion in section V.

Ii Pattern recognition models

Ii-a Notation

By we denote the brain states represented by features obtained from neuroimaging data, i. e. ; by we denote the (usually discrete) experimental conditions. Throughout this paper we use the notations , and

for (conditional or joint) probability density functions (PDFs). All PDFs are assumed to be known.

Independence is denoted by and conditional independence by . Dependence and conditional dependence is denoted by and , respectively. Causal relations in a directed acyclic graph are denoted by [7].

Ii-B Encoding and decoding models

An encoding model represents how various experimental conditions are encoded in different brain states. We ask “How does the brain state look like given a certain experimental condition?”. Examples for encoding models are the general linear model [8] or the class-conditional mean: .

A decoding model represents how different experimental conditions can be inferred from different brain states [9]

. We ask “Which experimental condition is most likely given a certain brain state?”. Decoding models are for example obtained using support vector machines or linear regression.

Note that this distinction solely reflects the direction of modeling according to the brain state but neglects any causal relation between brain state and experimental condition that might be known a priori.

Ii-C Causal and anti-causal learning

The brain is constantly exposed to the world’s stimuli and processes them, e. g. giving raise to perceptions. As such, stimuli can only be causes but not effects of brain states . The brain also constantly generates responses, e. g. movements, that are caused by the brain states. This gives rise to the following causal structure in neuroimaging studies: stimulus brain state response. Note that we are not necessarily able to observe all stimuli that cause a certain brain state or all features of the brain state which are causal for . The causal structure enables us to distinguish between the following two scenarios:

Ii-C1 Stimulus-based experiments

In a stimulus-based experiment the experimental conditions correspond to stimuli presented to the subject. In general, we can control the stimulus presentation procedure and are thus able to randomize the presentation of stimuli. An example of a stimulus-based experiment is the randomized presentation of auditory stimuli to either the left or right ear. The causal structure of this setup is given by , i. e. stimuli cause brain activity.

In this case the encoding model represents a causal relation, while the decoding model models an anti-causal relation.

Ii-C2 Response-based experiments

In a response-based experiment the experimental conditions represent subjects’ responses that we observe. An example of a response-based experiment is the recording of volitional movements of either the left or right hand. The causal structure of this setup is given by , i. e. brain activity causes responses. Note that in this setting we are not able to control for and randomize the experimental conditions.

In contrast to a stimulus-based experiment, the encoding model of a response-based experiment represents an anti-causal relation, while the decoding model models a causal relation.

Ii-D Distinction of pattern recognition models

Considering both the distinction of encoding- and decoding models and the distinction of stimulus- and response-based experiments we obtain the following four types of models:

  1. Causal encoding models –

  2. Anti-causal decoding models –

  3. Anti-causal encoding models –

  4. Causal decoding models –

In the following section we provide theoretical justifications why this distinction needs to be considered before interpreting encoding- or decoding models. As we show, interpretability of relevant features depends on whether the model represents causal or anti-causal relations.

Iii Interpretation of relevant features

When interpreting an encoding model , we want to link features relevant for encoding to the experimental condition. Relevant here means that we determine the set of brain state features that the encoding model deems dependent on the experimental condition, i. e. the features for which and hence . The remaining features are independent of . One way to do this in practice is to test the class-conditional sample means of each feature for statistically significant differences. Features that, according to this univariate test, significantly vary with are considered relevant for the encoding model.

When interpreting a decoding model , we want to determine which features are relevant for decoding the experimental condition. Relevant here means that we determine if a brain state feature or a set of features helps in decoding the experimental condition, i. e. it is tested whether and hence . One way to do this in practice is recursive feature elimination, i. e. permuting or removing from the feature set and testing whether this significantly decreases decoding accuracy [10]. It is common to remove all features that are irrelevant for decoding to reduce dimensionality and obtain the minimal set of features that yields an optimal decoding model. Features of that set are considered relevant for the decoding model. We note that there might be other ways of identifying relevant features of a decoding model which might lead to different conclusions.

For our theoretical arguments we assume that we can identify all relevant features for each type of model. We now show that relevant features in encoding- and decoding models carry a different meaning depending on the causal structure.

Iii-a Causal encoding models

From the encoding model of a stimulus-based experiment we obtain the set of features that are dependent on , i. e. for every we have . We denote the complementary set as .

According to Reichenbach’s principle [11], the dependency between and implies that , , or with a joint common cause of and . In the stimulus-based setting we can control for and randomize the stimulus. This enables us to rule out the last two cases and conclude , i. e. the features in are genuine effects of [12].

In addition, we have , which allows us to conclude that features in are not genuine effects of .

As such, all relevant features in a causal encoding model are genuine effects of , while irrelevant features are not effects of .

Iii-B Anti-causal decoding models

From the decoding model of a stimulus-based experiment we obtain the minimal set of features that allows to decode the stimulus, i. e. . It hence holds that where is the set of features that do not further improve decoding.

We now describe two counterexamples that show that one can neither conclude that features in are not genuine effects of nor that features in are indeed genuine effects of . First, assume . Since , i. e. , we have although is actually a genuine effect of . Second, assume . Since , i. e. , we obtain although is not a genuine effect of .

This establishes that interpreting anti-causal decoding models in this way has two drawbacks. First, features in can only be considered as potential effects of . Second, genuine effects of might be missed.

Iii-C Anti-causal encoding models

Form the encoding model of a response-based experiment we obtain the set of features that are dependent on , i. e. for every we have . We denote the complementary set as (overloading notation).

According to Reichenbach’s principle, the dependency between and implies that , , or with a joint common cause of and . A priori we know that brain activity response. This enables us to rule out the case . As we show next, we can not uniquely determine which of the last two scenarios is the case, i. e. features in are potential but not necessarily genuine causes of .

Consider : we have and and therefore . But note that while , i. e. is not a cause of . This shows that features in are not necessarily genuine causes of .

Features in , on the other hand, are independent of and can hence be considered to be no causes of .

As such, not all relevant features in anti-causal encoding models are genuine causes of , while irrelevant features are indeed not causal for .

Iii-D Causal decoding models

From the decoding model of a response-based experiment we obtain the minimal set of features that allows to decode the response, i. e. . It hence holds that where is the set of features that do not further improve decoding.

We now describe two counterexamples that show that one can neither conclude that features in are not genuine causes of nor that features in are genuine causes of . First, assume . Since , i. e. , we have although is a cause of . Second, assume the graph shown in figure 1 where is a hidden common cause of and which is not observable as a brain state feature. Since and we have although both and are not causes of .

Fig. 1: Causal graph of an exemplary response-based experiment: is not observable as a brain state feature and hence denotes a hidden common cause of the observed brain state features and and the response .

This establishes that interpreting causal decoding models this way has two drawbacks. First, features in are not necessarily causes of . Second, genuine of causes might be missed.

Iii-E Subsumption

In the previous sections we showed that the interpretation of relevant features in encoding- and decoding models depends on the underlying causal structure. This justifies our argument that the distinction of encoding- and decoding models is not sufficient. In particular we argued that, without employing further assumptions,

  1. causal encoding models allow to identify genuine effects of .

  2. anti-causal decoding models allow to identify some potential effects of .

  3. anti-causal encoding models allow to identify potential causes of .

  4. causal decoding models allow to identify some potential causes of .

Iv Causal inference in neuroimaging

So far, we have argued that the causal structure of a neuroimaging study, i. e. whether we learn in causal- or anti-causal direction, has to be taken into account when interpreting relevant features in encoding- and decoding models. In particular, we have shown that, with the exception of the causal encoding model, the meaning of relevant features in encoding- and decoding models is ambiguous. In the following, we demonstrate on two examples that such ambiguities can be resolved by means of causal inference [7, 13]. Throughout this section we assume faithfulness, i. e. we assume that all observed (conditional) independence relations are implied by the causal structure [13]. In the following examples, we additionally assume causal sufficiency, i. e. we assume that there are no hidden confounders.

Iv-a Causal inference in stimulus-based experiments

Consider two brain state features and in a stimulus-based experiment with , , and .

If we learn an encoding model on this data, we find that and . We can thus conclude that is an effect of , i. e.  (cf. section III-A). We can not, however, determine the causal relation between and .

If we learn a decoding model, on the other hand, we find that , i. e. we find both features to be relevant for decoding , as and .

Under the assumptions of faithfulness and causal sufficiency, the only causal structure that can give rise to these observations is , i. e.  is a cause of [7].

By learning both an encoding- and a decoding model on the same data, and comparing relevant features, we have thus determined the causal relations between the observed variables. An example of this inference procedure, known as the inference rule for potential causation [7], is given in [14].

Iv-B Causal inference in response-based experiments

Consider two brain state features and in a response-based experiment with , and .

If we learn an encoding model on this data, we find that as . We thus conclude that both and are potential but not necessarily genuine causes of (cf. section III-C).

If we learn a decoding model, on the other hand, we find that only , as does not help for decoding if is already known due to . By only looking at the decoding model, we would only identify as a potential cause of .

Taken together, however, the only causal structures that can give rise to these observations, again assuming faithfulness and causal sufficiency, are or [7]. As in both structures , we can conclude that is a direct cause of . The role of , however, remains ambiguous.

By learning both an encoding- and a decoding model on the same data, and comparing relevant features, we have thus again identified a causal relation between observed variables.

V Conclusion

In the previous section, we have demonstrated on two examples how the combination of encoding- and decoding models can resolve ambiguities that can not be decided when only looking at one type of model. This is due to the fact that relevant features are determined by univariate independence tests in encoding models and by multivariate conditional independence tests in decoding models. Both types of tests provide complementary information on the underlying causal structure.

As we have shown in section IV-B, however, these tests do not always uniquely determine the causal structure of a given set of observed variables. In general, conditional independence tests on all subsets of observed variables may provide further information [7, 13]. An exhaustive description of the causal inference rules based on conditional independence tests is beyond the scope of the present paper.

We conclude by emphasizing that the causal structure, as determined by a priori knowledge and/or causal inference methods, has to be taken into account when interpreting neuroimaging data.  

References

  • [1] T. Naselaris, K. N. Kay, S. Nishimoto, and J. L. Gallant, “Encoding and decoding in fMRI,” NeuroImage, vol. 56, no. 2, pp. 400–410, 2011.
  • [2] M. T. Todd, L. E. Nystrom, and J. D. Cohen, “Confounds in multivariate pattern analysis: Theory and rule representation case study.” NeuroImage, vol. 77, pp. 157–165, 2013.
  • [3] S. Haufe, F. Meinecke, K. Görgen, S. Dähne, J.-D. Haynes, B. Blankertz, and F. Bießmann, “On the interpretation of weight vectors of linear models in multivariate neuroimaging,” NeuroImage, vol. 87, pp. 96–110, 2014.
  • [4]

    F. Pereira, T. Mitchell, and M. Botvinick, “Machine learning classifiers and fMRI: a tutorial overview,”

    NeuroImage, vol. 45, no. 1, pp. S199–S209, 2009.
  • [5] B. Schölkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij, “On causal and anticausal learning,” in 29th International Conference on Machine Learning (ICML 2012), 2012, pp. 1255–1262.
  • [6] M. Gomez-Rodriguez, J. Peters, J. Hill, B. Schölkopf, A. Gharabaghi, and M. Grosse-Wentrup, “Closing the sensorimotor loop: haptic feedback facilitates decoding of motor imagery,” Journal of Neural Engineering, vol. 8, no. 3, p. 036005, 2011.
  • [7] J. Pearl, Causality: models, reasoning and inference.   Cambridge University Press, 2000.
  • [8] K. J. Friston, A. P. Holmes, K. J. Worsley, J.-P. Poline, C. D. Frith, and R. S. Frackowiak, “Statistical parametric maps in functional imaging: a general linear approach,” Human Brain Mapping, vol. 2, no. 4, pp. 189–210, 1994.
  • [9] T. M. Mitchell, R. Hutchinson, R. S. Niculescu, F. Pereira, X. Wang, M. Just, and S. Newman, “Learning to decode cognitive states from brain images,” Machine Learning, vol. 57, no. 1-2, pp. 145–175, 2004.
  • [10] F. De Martino, G. Valente, N. Staeren, J. Ashburner, R. Goebel, and E. Formisano, “Combining multivariate voxel selection and support vector machines for mapping and classification of fMRI spatial patterns,” NeuroImage, vol. 43, no. 1, pp. 44–58, 2008.
  • [11] H. Reichenbach, The Direction of Time.   University of California Press, Berkeley, 1956.
  • [12] P. W. Holland, “Statistics and causal inference,” Journal of the American Statistical Association, vol. 81, no. 396, pp. 945–960, 1986.
  • [13] P. Spirtes, C. N. Glymour, and R. Scheines, Causation, prediction, and search.   MIT press, 2000.
  • [14] M. Grosse-Wentrup, B. Schölkopf, and J. Hill, “Causal influence of gamma oscillations on the sensorimotor rhythm,” NeuroImage, vol. 56, no. 2, pp. 837–842, 2011.