FedMatch: Federated Learning Over Heterogeneous Question Answering Data

08/11/2021
by   Jiangui Chen, et al.
0

Question Answering (QA), a popular and promising technique for intelligent information access, faces a dilemma about data as most other AI techniques. On one hand, modern QA methods rely on deep learning models which are typically data-hungry. Therefore, it is expected to collect and fuse all the available QA datasets together in a common site for developing a powerful QA model. On the other hand, real-world QA datasets are typically distributed in the form of isolated islands belonging to different parties. Due to the increasing awareness of privacy security, it is almost impossible to integrate the data scattered around, or the cost is prohibited. A possible solution to this dilemma is a new approach known as federated learning, which is a privacy-preserving machine learning technique over distributed datasets. In this work, we propose to adopt federated learning for QA with the special concern on the statistical heterogeneity of the QA data. Here the heterogeneity refers to the fact that annotated QA data are typically with non-identical and independent distribution (non-IID) and unbalanced sizes in practice. Traditional federated learning methods may sacrifice the accuracy of individual models under the heterogeneous situation. To tackle this problem, we propose a novel Federated Matching framework for QA, named FedMatch, with a backbone-patch architecture. The shared backbone is to distill the common knowledge of all the participants while the private patch is a compact and efficient module to retain the domain information for each participant. To facilitate the evaluation, we build a benchmark collection based on several QA datasets from different domains to simulate the heterogeneous situation in practice. Empirical studies demonstrate that our model can achieve significant improvements against the baselines over all the datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2023

Towards Fair and Privacy Preserving Federated Learning for the Healthcare Domain

Federated learning enables data sharing in healthcare contexts where it ...
research
08/26/2020

Performance Optimization for Federated Person Re-identification via Benchmark Analysis

Federated learning is a privacy-preserving machine learning technique th...
research
08/07/2020

LotteryFL: Personalized and Communication-Efficient Federated Learning with Lottery Ticket Hypothesis on Non-IID Datasets

Federated learning is a popular distributed machine learning paradigm wi...
research
08/17/2020

An Isolated Data Island Benchmark Suite for Federated Learning

Federated learning (FL) is a new machine learning paradigm, the goal of ...
research
09/07/2022

Modular Federated Learning

Federated learning is an approach to train machine learning models on th...
research
07/09/2021

Lithography Hotspot Detection via Heterogeneous Federated Learning with Local Adaptation

As technology scaling is approaching the physical limit, lithography hot...
research
01/28/2023

Heterogeneous Datasets for Federated Survival Analysis Simulation

Survival analysis studies time-modeling techniques for an event of inter...

Please sign up or login with your details

Forgot password? Click here to reset