TextHide: Tackling Data Privacy in Language Understanding Tasks

10/12/2020
by   Yangsibo Huang, et al.
7

An unsolved challenge in distributed or federated learning is to effectively mitigate privacy risks without slowing down training or reducing accuracy. In this paper, we propose TextHide aiming at addressing this challenge for natural language understanding tasks. It requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data. Such an encryption step is efficient and only affects the task performance slightly. In addition, TextHide fits well with the popular framework of fine-tuning pre-trained language models (e.g., BERT) for any sentence or sentence-pair task. We evaluate TextHide on the GLUE benchmark, and our experiments show that TextHide can effectively defend attacks on shared gradients or representations and the averaged accuracy reduction is only 1.9%. We also present an analysis of the security of TextHide using a conjecture about the computational intractability of a mathematical problem. Our code is available at https://github.com/Hazelsuko07/TextHide

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2022

Recovering Private Text in Federated Learning of Language Models

Federated learning allows distributed users to collaboratively train a m...
research
10/06/2020

InstaHide: Instance-hiding Schemes for Private Distributed Learning

How can multiple distributed entities collaboratively train a shared dee...
research
05/24/2023

Privacy Implications of Retrieval-Based Language Models

Retrieval-based language models (LMs) have demonstrated improved interpr...
research
02/17/2020

Incorporating BERT into Neural Machine Translation

The recently proposed BERT has shown great power on a variety of natural...
research
05/28/2023

Robust Natural Language Understanding with Residual Attention Debiasing

Natural language understanding (NLU) models often suffer from unintended...
research
09/03/2019

Transfer Fine-Tuning: A BERT Case Study

A semantic equivalence assessment is defined as a task that assesses sem...
research
05/15/2023

Memorization for Good: Encryption with Autoregressive Language Models

Over-parameterized neural language models (LMs) can memorize and recite ...

Please sign up or login with your details

Forgot password? Click here to reset