Counterfactual Maximum Likelihood Estimation for Training Deep Networks

06/07/2021
by   Xinyi Wang, et al.
13

Although deep learning models have driven state-of-the-art performance on a wide array of tasks, they are prone to learning spurious correlations that should not be learned as predictive clues. To mitigate this problem, we propose a causality-based training framework to reduce the spurious correlations caused by observable confounders. We give theoretical analysis on the underlying general Structural Causal Model (SCM) and propose to perform Maximum Likelihood Estimation (MLE) on the interventional distribution instead of the observational distribution, namely Counterfactual Maximum Likelihood Estimation (CMLE). As the interventional distribution, in general, is hidden from the observational data, we then derive two different upper bounds of the expected negative log-likelihood and propose two general algorithms, Implicit CMLE and Explicit CMLE, for causal predictions of deep learning models using observational data. We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning. The results show that CMLE methods outperform the regular MLE method in terms of out-of-domain generalization performance and reducing spurious correlations, while maintaining comparable performance on the regular evaluations.

READ FULL TEXT
01/26/2022

Improved Maximum Likelihood Estimation of ARMA Models

In this paper we propose a new optimization model for maximum likelihood...
06/13/2019

Efficiency of maximum likelihood estimation for a multinomial distribution with known probability sums

For a multinomial distribution, suppose that we have prior knowledge of ...
08/24/2021

Maximum Likelihood Estimation for Multimodal Learning with Missing Modality

Multimodal learning has achieved great successes in many scenarios. Comp...
07/24/2019

Some computational aspects of maximum likelihood estimation of the skew-t distribution

Since its introduction, the skew-t distribution has received much attent...
11/02/2020

Noise-Contrastive Estimation for Multivariate Point Processes

The log-likelihood of a generative model often involves both positive an...
03/08/2022

Neural Contextual Bandits via Reward-Biased Maximum Likelihood Estimation

Reward-biased maximum likelihood estimation (RBMLE) is a classic princip...
06/12/2020

Learning Causal Models Online

Predictive models – learned from observational data not covering the com...