Privately Customizing Prefinetuning to Better Match User Data in Federated Learning

02/17/2023
by   Charlie Hou, et al.
9

In Federated Learning (FL), accessing private client data incurs communication and privacy costs. As a result, FL deployments commonly prefinetune pretrained foundation models on a (large, possibly public) dataset that is held by the central server; they then FL-finetune the model on a private, federated dataset held by clients. Evaluating prefinetuning dataset quality reliably and privately is therefore of high importance. To this end, we propose FreD (Federated Private Fréchet Distance) – a privately computed distance between a prefinetuning dataset and federated datasets. Intuitively, it privately computes and compares a Fréchet distance between embeddings generated by a large language model on both the central (public) dataset and the federated private client data. To make this computation privacy-preserving, we use distributed, differentially-private mean and covariance estimators. We show empirically that FreD accurately predicts the best prefinetuning dataset at minimal privacy cost. Altogether, using FreD we demonstrate a proof-of-concept for a new approach in private FL training: (1) customize a prefinetuning dataset to better match user data (2) prefinetune (3) perform FL-finetuning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2022

The Interpolated MVU Mechanism For Communication-efficient Private Federated Learning

We consider private federated learning (FL), where a server aggregates d...
research
09/27/2019

Federated User Representation Learning

Collaborative personalization, such as through learned user representati...
research
09/13/2020

FLaPS: Federated Learning and Privately Scaling

Federated learning (FL) is a distributed learning process where the mode...
research
12/17/2018

Learning Private Neural Language Modeling with Attentive Aggregation

Mobile keyboard suggestion is typically regarded as a word-level languag...
research
06/13/2021

DP-NormFedAvg: Normalizing Client Updates for Privacy-Preserving Federated Learning

In this paper, we focus on facilitating differentially private quantized...
research
11/21/2022

DPD-fVAE: Synthetic Data Generation Using Federated Variational Autoencoders With Differentially-Private Decoder

Federated learning (FL) is getting increased attention for processing se...
research
11/12/2020

Heterogeneous Data-Aware Federated Learning

Federated learning (FL) is an appealing concept to perform distributed t...

Please sign up or login with your details

Forgot password? Click here to reset