Causal Direction of Data Collection Matters: Implications of Causal and Anticausal Learning in NLP

10/07/2021
by   Zhijing Jin, et al.
0

The principle of independent causal mechanisms (ICM) states that generative processes of real world data consist of independent modules which do not influence or inform each other. While this idea has led to fruitful developments in the field of causal inference, it is not widely-known in the NLP community. In this work, we argue that the causal direction of the data collection process bears nontrivial implications that can explain a number of published NLP findings, such as differences in semi-supervised learning (SSL) and domain adaptation (DA) performance across different settings. We categorize common NLP tasks according to their causal direction and empirically assay the validity of the ICM principle for text data using minimum description length. We conduct an extensive meta-analysis of over 100 published SSL and 30 DA studies, and find that the results are consistent with our expectations based on causal insights. This work presents the first attempt to analyze the ICM principle in NLP, and provides constructive suggestions for future modeling choices. Code available at https://github.com/zhijing-jin/icm4nlp.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2019

MixMatch Domain Adaptaion: Prize-winning solution for both tracks of VisDA 2019 challenge

We present a domain adaptation (DA) system that can be used in multi-sou...
research
06/08/2023

Causal normalizing flows: from theory to practice

In this work, we deepen on the use of normalizing flows for causal reaso...
research
02/22/2017

Causal Inference by Stochastic Complexity

The algorithmic Markov condition states that the most likely causal dire...
research
02/10/2020

Few-shot Domain Adaptation by Causal Mechanism Transfer

We study few-shot supervised domain adaptation (DA) for regression probl...
research
04/05/2022

Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks

We introduce Dynatask: an open source system for setting up custom NLP t...
research
05/02/2023

Psychologically-Inspired Causal Prompts

NLP datasets are richer than just input-output pairs; rather, they carry...
research
05/18/2020

An Analysis of the Adaptation Speed of Causal Models

We consider the problem of discovering the causal process that generated...

Please sign up or login with your details

Forgot password? Click here to reset