Combining Linear Non-Gaussian Acyclic Model with Logistic Regression Model for Estimating Causal Structure from Mixed Continuous and Discrete Data

02/16/2018
by   Chao Li, et al.
0

Estimating causal models from observational data is a crucial task in data analysis. For continuous-valued data, Shimizu et al. have proposed a linear acyclic non-Gaussian model to understand the data generating process, and have shown that their model is identifiable when the number of data is sufficiently large. However, situations in which continuous and discrete variables coexist in the same problem are common in practice. Most existing causal discovery methods either ignore the discrete data and apply a continuous-valued algorithm or discretize all the continuous data and then apply a discrete Bayesian network approach. These methods possibly loss important information when we ignore discrete data or introduce the approximation error due to discretization. In this paper, we define a novel hybrid causal model which consists of both continuous and discrete variables. The model assumes: (1) the value of a continuous variable is a linear function of its parent variables plus a non-Gaussian noise, and (2) each discrete variable is a logistic variable whose distribution parameters depend on the values of its parent variables. In addition, we derive the BIC scoring function for model selection. The new discovery algorithm can learn causal structures from mixed continuous and discrete data without discretization. We empirically demonstrate the power of our method through thorough simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2012

Causal discovery of linear acyclic models with arbitrary distributions

An important task in data analysis is the discovery of causal relationsh...
research
06/05/2021

On the Role of Entropy-based Loss for Learning Causal Structures with Continuous Optimization

Causal discovery from observational data is an important but challenging...
research
07/04/2012

Discovery of non-gaussian linear causal models using ICA

In recent years, several methods have been proposed for the discovery of...
research
07/28/2015

Scaling up Greedy Causal Search for Continuous Variables

As standardly implemented in R or the Tetrad program, causal search algo...
research
06/24/2021

MIxBN: library for learning Bayesian networks from mixed data

This paper describes a new library for learning Bayesian networks from d...
research
02/06/2013

Nonuniform Dynamic Discretization in Hybrid Networks

We consider probabilistic inference in general hybrid networks, which in...
research
08/17/2021

Improving Accuracy of Permutation DAG Search using Best Order Score Search

The Sparsest Permutation (SP) algorithm is accurate but limited to about...

Please sign up or login with your details

Forgot password? Click here to reset