Local calibration of verbal autopsy algorithms

by   Abhirup Datta, et al.

Computer-coded-verbal-autopsy (CCVA) algorithms used to generate burden-of-disease estimates rely on non-local training data and yield inaccurate estimates in local context. We present a general calibration framework to improve estimates of cause-specific-mortality-fractions from CCVA when limited local training data is available. We formulate a Bayesian hierarchical local calibration of discrete classifiers that updates a non-locally trained CCVA estimate using estimates of the misclassification rates of the CCVA algorithm on the local training data. This involves a novel transition matrix shrinkage for the misclassification matrix which theoretically guarantees that, in absence of any local data or when the CCVA algorithm is perfect, the calibrated estimate coincides with its uncalibrated analog, thereby subsuming the default practice as a special case. A novel Gibbs sampler using data augmentation enables fast implementation. We also present an ensemble calibration using predictions from multiple CCVA algorithms as inputs to produce a unified estimate. A theoretical result demonstrates how the ensemble calibration favors the most accurate algorithm. Simulation and real data analysis establish the improvement accomplished by calibration. We present extensions to model the etiology distribution as functions of demographic covariates, and an EM-algorithm-based MAP-estimation as an alternate to MCMC. An R-package implementing this calibration is publicly available.


page 22

page 25


Diverse Ensembles Improve Calibration

Modern deep neural networks can produce badly calibrated predictions, es...

Better Classifier Calibration for Small Data Sets

Classifier calibration does not always go hand in hand with the classifi...

Pólygamma Data Augmentation to address Non-conjugacy in the Bayesian Estimation of Mixed Multinomial Logit Models

The standard Gibbs sampler of Mixed Multinomial Logit (MMNL) models invo...

Area-covering postprocessing of ensemble precipitation forecasts using topographical and seasonal conditions

Probabilistic weather forecasts from ensemble systems require statistica...

Spline Analysis of Biomarker Data Pooled From Multiple Matched/Nested Case-Control Studies

Pooling biomarker data across multiple studies enables researchers to ge...

Do We Still Need Non-Maximum Suppression? Accurate Confidence Estimates and Implicit Duplication Modeling with IoU-Aware Calibration

Object detectors are at the heart of many semi- and fully autonomous dec...

Better Boosting with Bandits for Online Learning

Probability estimates generated by boosting ensembles are poorly calibra...

Please sign up or login with your details

Forgot password? Click here to reset