Counterfactual Maximum Likelihood Estimation for Training Deep Networks

by   Xinyi Wang, et al.

Although deep learning models have driven state-of-the-art performance on a wide array of tasks, they are prone to learning spurious correlations that should not be learned as predictive clues. To mitigate this problem, we propose a causality-based training framework to reduce the spurious correlations caused by observable confounders. We give theoretical analysis on the underlying general Structural Causal Model (SCM) and propose to perform Maximum Likelihood Estimation (MLE) on the interventional distribution instead of the observational distribution, namely Counterfactual Maximum Likelihood Estimation (CMLE). As the interventional distribution, in general, is hidden from the observational data, we then derive two different upper bounds of the expected negative log-likelihood and propose two general algorithms, Implicit CMLE and Explicit CMLE, for causal predictions of deep learning models using observational data. We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning. The results show that CMLE methods outperform the regular MLE method in terms of out-of-domain generalization performance and reducing spurious correlations, while maintaining comparable performance on the regular evaluations.


Improved Maximum Likelihood Estimation of ARMA Models

In this paper we propose a new optimization model for maximum likelihood...

Efficiency of maximum likelihood estimation for a multinomial distribution with known probability sums

For a multinomial distribution, suppose that we have prior knowledge of ...

Maximum Likelihood Estimation for Multimodal Learning with Missing Modality

Multimodal learning has achieved great successes in many scenarios. Comp...

Some computational aspects of maximum likelihood estimation of the skew-t distribution

Since its introduction, the skew-t distribution has received much attent...

Noise-Contrastive Estimation for Multivariate Point Processes

The log-likelihood of a generative model often involves both positive an...

Neural Contextual Bandits via Reward-Biased Maximum Likelihood Estimation

Reward-biased maximum likelihood estimation (RBMLE) is a classic princip...

Learning Causal Models Online

Predictive models – learned from observational data not covering the com...