Predicting Census Survey Response Rates via Interpretable Nonparametric Additive Models with Structured Interactions

by   Shibal Ibrahim, et al.

Accurate and interpretable prediction of survey response rates is important from an operational standpoint. The US Census Bureau's well-known ROAM application uses principled statistical models trained on the US Census Planning Database data to identify hard-to-survey areas. An earlier crowdsourcing competition revealed that an ensemble of regression trees led to the best performance in predicting survey response rates; however, the corresponding models could not be adopted for the intended application due to limited interpretability. In this paper, we present new interpretable statistical methods to predict, with high accuracy, response rates in surveys. We study sparse nonparametric additive models with pairwise interactions via ℓ_0-regularization, as well as hierarchically structured variants that provide enhanced interpretability. Despite strong methodological underpinnings, such models can be computationally challenging – we present new scalable algorithms for learning these models. We also establish novel non-asymptotic error bounds for the proposed estimators. Experiments based on the US Census Planning Database demonstrate that our methods lead to high-quality predictive models that permit actionable interpretability for different segments of the population. Interestingly, our methods provide significant gains in interpretability without losing in predictive performance to state-of-the-art black-box machine learning methods based on gradient boosting and feedforward neural networks. Our code implementation in python is available at


page 8

page 18

page 19


Hybrid Predictive Model: When an Interpretable Model Collaborates with a Black-box Model

Interpretable machine learning has become a strong competitor for tradit...

Higher-order Neural Additive Models: An Interpretable Machine Learning Model with Feature Interactions

Black-box models, such as deep neural networks, exhibit superior predict...

Neural Additive Models for Location Scale and Shape: A Framework for Interpretable Neural Regression Beyond the Mean

Deep neural networks (DNNs) have proven to be highly effective in a vari...

A Semiparametric Approach to Interpretable Machine Learning

Black box models in machine learning have demonstrated excellent predict...

Supervised Learning and Model Analysis with Compositional Data

The compositionality and sparsity of high-throughput sequencing data pos...

Learning Difference Equations with Structured Grammatical Evolution for Postprandial Glycaemia Prediction

People with diabetes must carefully monitor their blood glucose levels, ...

A Nonparametric Bayesian Model for Sparse Temporal Multigraphs

As the availability and importance of temporal interaction data–such as ...

Please sign up or login with your details

Forgot password? Click here to reset