Selecting Robust Features for Machine Learning Applications using Multidata Causal Discovery

04/11/2023
by   Saranya Ganesh S., et al.
0

Robust feature selection is vital for creating reliable and interpretable Machine Learning (ML) models. When designing statistical prediction models in cases where domain knowledge is limited and underlying interactions are unknown, choosing the optimal set of features is often difficult. To mitigate this issue, we introduce a Multidata (M) causal feature selection approach that simultaneously processes an ensemble of time series datasets and produces a single set of causal drivers. This approach uses the causal discovery algorithms PC1 or PCMCI that are implemented in the Tigramite Python package. These algorithms utilize conditional independence tests to infer parts of the causal graph. Our causal feature selection approach filters out causally-spurious links before passing the remaining causal features as inputs to ML models (Multiple linear regression, Random Forest) that predict the targets. We apply our framework to the statistical intensity prediction of Western Pacific Tropical Cyclones (TC), for which it is often difficult to accurately choose drivers and their dimensionality reduction (time lags, vertical levels, and area-averaging). Using more stringent significance thresholds in the conditional independence tests helps eliminate spurious causal relationships, thus helping the ML model generalize better to unseen TC cases. M-PC1 with a reduced number of features outperforms M-PCMCI, non-causal ML, and other feature selection methods (lagged correlation, random), even slightly outperforming feature selection based on eXplainable Artificial Intelligence. The optimal causal drivers obtained from our causal feature selection help improve our understanding of underlying relationships and suggest new potential drivers of TC intensification.

READ FULL TEXT

page 16

page 17

page 18

research
02/16/2018

A Unified View of Causal and Non-causal Feature Selection

In this paper, we unify causal and non-causal feature feature selection ...
research
11/17/2019

Causality-based Feature Selection: Methods and Evaluations

Feature selection is a crucial preprocessing step in data analytics and ...
research
02/28/2020

Causality and Robust Optimization

A decision-maker must consider cofounding bias when attempting to apply ...
research
06/18/2023

Can predictive models be used for causal inference?

Supervised machine learning (ML) and deep learning (DL) algorithms excel...
research
01/21/2020

Nonparametric Causal Feature Selection for Spatiotemporal Risk Mapping of Malaria Incidence in Madagascar

Modern disease mapping uses high resolution environmental and socioecono...
research
11/23/2021

Filter Methods for Feature Selection in Supervised Machine Learning Applications – Review and Benchmark

The amount of data for machine learning (ML) applications is constantly ...
research
07/06/2020

Causal Feature Selection via Orthogonal Search

The problem of inferring the direct causal parents of a response variabl...

Please sign up or login with your details

Forgot password? Click here to reset