Investigating the effect of binning on causal discovery

02/23/2022
by   Andrew Colt Deckert, et al.
0

Binning (a.k.a. discretization) of numerically continuous measurements is a wide-spread but controversial practice in data collection, analysis, and presentation. The consequences of binning have been evaluated for many different kinds of data analysis methods, however so far the effect of binning on causal discovery algorithms has not been directly investigated. This paper reports the results of a simulation study that examined the effect of binning on the Greedy Equivalence Search (GES) causal discovery algorithm. Our findings suggest that unbinned continuous data often result in the highest search performance, but some exceptions are identified. We also found that binned data are more sensitive to changes in sample size and tuning parameters, and identified some interactive effects between sample size, binning, and tuning parameter on performance.

READ FULL TEXT

page 11

page 13

research
02/25/2022

Causal discovery for observational sciences using supervised machine learning

Causal inference can estimate causal effects, but unless data are collec...
research
03/15/2012

Causal Conclusions that Flip Repeatedly and Their Justification

Over the past two decades, several consistent procedures have been desig...
research
10/04/2019

Simulations evaluating resampling methods for causal discovery: ensemble performance and calibration

Causal discovery can be a powerful tool for investigating causality when...
research
04/11/2023

KGS: Causal Discovery Using Knowledge-guided Greedy Equivalence Search

Learning causal relationships solely from observational data provides in...
research
03/12/2020

Power and Sample Size for Marginal Structural Models

Marginal structural models fit via inverse probability of treatment weig...
research
09/20/2022

Effects of Influential Points and Sample Size on the Selection and Replicability of Multivariable Fractional Polynomial Models

The multivariable fractional polynomial (MFP) procedure combines variabl...
research
06/02/2020

Unsupervised Discretization by Two-dimensional MDL-based Histogram

Unsupervised discretization is a crucial step in many knowledge discover...

Please sign up or login with your details

Forgot password? Click here to reset