Gaussian Process filtering for calibration of low-cost air-pollution sensor network data
Low-cost air pollution sensors, offering hyper-local characterization of pollutant concentrations, are becoming increasingly prevalent in environmental and public health research. However, low-cost air pollution data can be noisy, biased by environmental conditions, and usually need to be field-calibrated by co-locating low-cost sensors with reference-grade instruments. We show, theoretically and empirically, that the common of procedure of regression-based calibration using co-located data systematically underestimates high air-pollution concentrations, which are critical to diagnose from a health perspective. Current calibration practices also often fail to utilize the spatial correlation in pollutant concentrations. We propose a novel spatial filtering approach to co-location-based calibration of low-cost networks that mitigates the underestimation issue by using an inverse regression and incorporates spatial correlation by second-stage modeling of the true pollutant concentrations using a conditional Gaussian Process. Our approach works with one or more co-located sites in the network and is dynamic, leveraging spatial correlation with the latest available reference data. Through extensive simulations, we demonstrate how the spatial filtering substantially improves estimation of pollutant concentrations, and measures peak concentrations with greater accuracy. We apply the methodology for calibration of a low-cost PM2.5 network in Baltimore, Maryland, and diagnose air-pollution peaks that are missed by the regression-calibration.
READ FULL TEXT