Federated Learning of Molecular Properties in a Heterogeneous Setting

09/15/2021
by   Wei Zhu, et al.
15

Chemistry research has both high material and computational costs to conduct experiments. Institutions thus consider chemical data to be valuable and there have been few efforts to construct large public datasets for machine learning. Another challenge is that different intuitions are interested in different classes of molecules, creating heterogeneous data that cannot be easily joined by conventional distributed training. In this work, we introduce federated heterogeneous molecular learning to address these challenges. Federated learning allows end-users to build a global model collaboratively while preserving the training data distributed over isolated clients. Due to the lack of related research, we first simulate a federated heterogeneous benchmark called FedChem. FedChem is constructed by jointly performing scaffold splitting and Latent Dirichlet Allocation on existing datasets. Our results on FedChem show that significant learning challenges arise when working with heterogeneous molecules. We then propose a method to alleviate the problem, namely Federated Learning by Instance reweighTing (FLIT). FLIT can align the local training across heterogeneous clients by improving the performance for uncertain samples. Comprehensive experiments conducted on our new benchmark FedChem validate the advantages of this method over other federated learning schemes. FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2020

HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients

Federated Learning (FL) is a method of training machine learning models ...
research
01/27/2023

FedHP: Heterogeneous Federated Learning with Privacy-preserving

Federated Learning is a distributed machine learning environment, which ...
research
08/22/2022

FedOS: using open-set learning to stabilize training in federated learning

Federated Learning is a recent approach to train statistical models on d...
research
06/15/2021

On Large-Cohort Training for Federated Learning

Federated learning methods typically learn a model by iteratively sampli...
research
08/01/2023

Data Collaboration Analysis applied to Compound Datasets and the Introduction of Projection data to Non-IID settings

Given the time and expense associated with bringing a drug to market, nu...
research
07/20/2022

Multigraph Topology Design for Cross-Silo Federated Learning

Cross-silo federated learning utilizes a few hundred reliable data silos...
research
03/24/2022

Optimal MIMO Combining for Blind Federated Edge Learning with Gradient Sparsification

We provide the optimal receive combining strategy for federated learning...

Please sign up or login with your details

Forgot password? Click here to reset