Nonparametric Causal Feature Selection for Spatiotemporal Risk Mapping of Malaria Incidence in Madagascar
Modern disease mapping uses high resolution environmental and socioeconomic data as covariates, or `features', within a geostatistical framework to improve predictions of disease risk. Feature selection is an important step in building these models, helping to reduce overfitting and computational complexity, and to improve model interpretability. Selecting features that have a causal relationship with the response variable (not just an association) could potentially improve predictions and generalisability, but identifying these causal features from non-interventional, spatiotemporal data is a challenging problem. Here we apply a causal inference algorithm – the PC algorithm with spatiotemporal prewhitening and nonparametric independence tests – to explore the performance of causal feature selection for predicting malaria incidence in Madagascar. This case study reveals a clear advantage for the causal feature selection approach with respect to the out-of-sample predictive accuracy of forward temporal forecasting, but not for spatiotemporal interpolation, in comparison with no feature selection and LASSO feature selection.
READ FULL TEXT