Q-LSTM Language Model – Decentralized Quantum Multilingual Pre-Trained Language Model for Privacy Protection

10/06/2022
by   Shuyue Stella Li, et al.
19

Large-scale language models are trained on a massive amount of natural language data that might encode or reflect our private information. With careful manipulation, malicious agents can reverse engineer the training data even if data sanitation and differential privacy algorithms were involved in the pre-training process. In this work, we propose a decentralized training framework to address privacy concerns in training large-scale language models. The framework consists of a cloud quantum language model built with Variational Quantum Classifiers (VQC) for sentence embedding and a local Long-Short Term Memory (LSTM) model. We use both intrinsic evaluation (loss, perplexity) and extrinsic evaluation (downstream sentiment analysis task) to evaluate the performance of our quantum language model. Our quantum model was comparable to its classical counterpart on all the above metrics. We also perform ablation studies to look into the effect of the size of VQC and the size of training data on the performance of the model. Our approach solves privacy concerns without sacrificing downstream task performance. The intractability of quantum operations on classical hardware ensures the confidentiality of the training data and makes it impossible to be recovered by any adversary.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2022

What Does it Mean for a Language Model to Preserve Privacy?

Natural language reflects our private lives and identities, making its p...
research
11/05/2022

Privacy-Preserving Models for Legal Natural Language Processing

Pre-training large transformer models with in-domain data improves domai...
research
05/04/2022

Provably Confidential Language Modelling

Large language models are shown to memorize privacy information such as ...
research
01/15/2023

Distributed LSTM-Learning from Differentially Private Label Proportions

Data privacy and decentralised data collection has become more and more ...
research
05/26/2022

Differentially Private Decoding in Large Language Models

Recent large-scale natural language processing (NLP) systems use a pre-t...
research
09/12/2023

Recovering from Privacy-Preserving Masking with Large Language Models

Model adaptation is crucial to handle the discrepancy between proxy trai...
research
10/13/2021

The Dawn of Quantum Natural Language Processing

In this paper, we discuss the initial attempts at boosting understanding...

Please sign up or login with your details

Forgot password? Click here to reset