Heterogeneous Datasets for Federated Survival Analysis Simulation

01/28/2023
by   Alberto Archetti, et al.
0

Survival analysis studies time-modeling techniques for an event of interest occurring for a population. Survival analysis found widespread applications in healthcare, engineering, and social sciences. However, the data needed to train survival models are often distributed, incomplete, censored, and confidential. In this context, federated learning can be exploited to tremendously improve the quality of the models trained on distributed data while preserving user privacy. However, federated survival analysis is still in its early development, and there is no common benchmarking dataset to test federated survival models. This work proposes a novel technique for constructing realistic heterogeneous datasets by starting from existing non-federated datasets in a reproducible way. Specifically, we provide two novel dataset-splitting algorithms based on the Dirichlet distribution to assign each data sample to a carefully chosen client: quantity-skewed splitting and label-skewed splitting. Furthermore, these algorithms allow for obtaining different levels of heterogeneity by changing a single hyperparameter. Finally, numerical experiments provide a quantitative evaluation of the heterogeneity level using log-rank tests and a qualitative analysis of the generated splits. The implementation of the proposed methods is publicly available in favor of reproducibility and to encourage common practices to simulate federated environments for survival analysis.

READ FULL TEXT
research
02/06/2023

Federated Survival Forests

Survival analysis is a subfield of statistics concerned with modeling th...
research
08/04/2023

Scaling Survival Analysis in Healthcare with Federated Survival Forests: A Comparative Study on Heart Failure and Breast Cancer Genomics

Survival analysis is a fundamental tool in medicine, modeling the time u...
research
07/12/2022

FedPseudo: Pseudo value-based Deep Learning Models for Federated Survival Analysis

Survival analysis, time-to-event analysis, is an important problem in he...
research
09/03/2023

A Comparative Evaluation of FedAvg and Per-FedAvg Algorithms for Dirichlet Distributed Heterogeneous Data

In this paper, we investigate Federated Learning (FL), a paradigm of mac...
research
02/08/2022

Practical Challenges in Differentially-Private Federated Survival Analysis of Medical Data

Survival analysis or time-to-event analysis aims to model and predict th...
research
08/11/2021

FedMatch: Federated Learning Over Heterogeneous Question Answering Data

Question Answering (QA), a popular and promising technique for intellige...
research
06/16/2020

Federated Survival Analysis with Discrete-Time Cox Models

Building machine learning models from decentralized datasets located in ...

Please sign up or login with your details

Forgot password? Click here to reset