Spatial Matrix Completion for Spatially-Misaligned and High-Dimensional Air Pollution Data

04/11/2020
by   Phuong T. Vu, et al.
0

In health-pollution cohort studies, accurate predictions of pollutant concentrations at new locations are needed, since the locations of fixed monitoring sites and study participants are often spatially misaligned. For multi-pollution data, principal component analysis (PCA) is often incorporated to obtain low-rank (LR) structure of the data prior to spatial prediction. Recently developed predictive PCA modifies the traditional algorithm to improve the overall predictive performance by leveraging both LR and spatial structures within the data. However, predictive PCA requires complete data or an initial imputation step. Nonparametric imputation techniques without accounting for spatial information may distort the underlying structure of the data, and thus further reduce the predictive performance. We propose a convex optimization problem inspired by the LR matrix completion framework and develop a proximal algorithm to solve it. Missing data are imputed and handled concurrently within the algorithm, which eliminates the necessity of a separate imputation step. We show that our algorithm has low computational burden and leads to reliable predictive performance as the severity of missing data increases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2023

Blockwise Principal Component Analysis for monotone missing data imputation and dimensionality reduction

Monotone missing data is a common problem in data analysis. However, imp...
research
02/11/2018

PCA-Based Missing Information Imputation for Real-Time Crash Likelihood Prediction Under Imbalanced Data

The real-time crash likelihood prediction has been an important research...
research
11/26/2018

Sparse spectral estimation with missing and corrupted measurements

Supervised learning methods with missing data have been extensively stud...
research
08/07/2018

Generalized Integrative Principal Component Analysis for Multi-Type Data with Block-Wise Missing Structure

High-dimensional multi-source data are encountered in many fields. Despi...
research
03/30/2022

A Shared Parameter Model for Systolic Blood Pressure Accounting for Data Missing Not at Random in the HUNT Study

In this work, blood pressure eleven years ahead is modeled using data fr...
research
11/11/2020

Bayes Optimal Informer Sets for Early-Stage Drug Discovery

An important experimental design problem in early-stage drug discovery i...

Please sign up or login with your details

Forgot password? Click here to reset