Fed-MIWAE: Federated Imputation of Incomplete Data via Deep Generative Models

by   Irene Balelli, et al.

Federated learning allows for the training of machine learning models on multiple decentralized local datasets without requiring explicit data exchange. However, data pre-processing, including strategies for handling missing data, remains a major bottleneck in real-world federated learning deployment, and is typically performed locally. This approach may be biased, since the subpopulations locally observed at each center may not be representative of the overall one. To address this issue, this paper first proposes a more consistent approach to data standardization through a federated model. Additionally, we propose Fed-MIWAE, a federated version of the state-of-the-art imputation method MIWAE, a deep latent variable model for missing data imputation based on variational autoencoders. MIWAE has the great advantage of being easily trainable with classical federated aggregators. Furthermore, it is able to deal with MAR (Missing At Random) data, a more challenging missing-data mechanism than MCAR (Missing Completely At Random), where the missingness of a variable can depend on the observed ones. We evaluate our method on multi-modal medical imaging data and clinical scores from a simulated federated scenario with the ADNI dataset. We compare Fed-MIWAE with respect to classical imputation methods, either performed locally or in a centralized fashion. Fed-MIWAE allows to achieve imputation accuracy comparable with the best centralized method, even when local data distributions are highly heterogeneous. In addition, thanks to the variational nature of Fed-MIWAE, our method is designed to perform multiple imputation, allowing for the quantification of the imputation uncertainty in the federated scenario.


page 1

page 2

page 3

page 4


Leveraging variational autoencoders for multiple data imputation

Missing data persists as a major barrier to data analysis across numerou...

missIWAE: Deep Generative Modelling and Imputation of Incomplete Data

We present a simple technique to train deep latent variable models (DLVM...

MIWAE: Deep Generative Modelling and Imputation of Incomplete Data

We consider the problem of handling missing data with deep latent variab...

Deep Generative Imputation Model for Missing Not At Random Data

Data analysis usually suffers from the Missing Not At Random (MNAR) prob...

Regression-based imputation of explanatory discrete missing data

Imputation of missing values is a strategy for handling non-responses in...

Distributed learning optimisation of Cox models can leak patient data: Risks and solutions

Medical data are often highly sensitive, and frequently there are missin...

Unsupervised Data Imputation via Variational Inference of Deep Subspaces

A wide range of systems exhibit high dimensional incomplete data. Accura...

Please sign up or login with your details

Forgot password? Click here to reset