MERGE: Fast Private Text Generation

05/25/2023
by   Zi Liang, et al.
0

Recent years have seen increasing concerns about the private inference of NLP services and Transformer models. However, existing two-party privacy-preserving methods solely consider NLU scenarios, while the private inference of text generation such as translation, dialogue, and code completion remains unsolved. Besides, while migrated to NLG models, existing privacy-preserving methods perform poorly in terms of inference speed, and suffer from the convergence problem during the training stage. To address these issues, we propose MERGE, a fast private text generation framework for Transformer-based language models. Specifically, MERGE reuse the output hidden state as the word embedding to bypass the embedding computation, and reorganize the linear operations in the Transformer module to accelerate the forward procedure. Based on these two optimizations, extensive experiments show that MERGE can achieve a 26.5x speedup under the sequence length 512, and reduce 80% communication bytes, with an up to 10x speedup to existing state-of-art models.

READ FULL TEXT
research
03/23/2023

Primer: Fast Private Transformer Inference on Encrypted Data

It is increasingly important to enable privacy-preserving inference for ...
research
09/22/2022

DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation

Transformer is a deep learning language model widely used for natural la...
research
10/16/2021

Sparse Distillation: Speeding Up Text Classification by Using Bigger Models

Distilling state-of-the-art transformer models into lightweight student ...
research
06/01/2022

THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption

As more and more pre-trained language models adopt on-cloud deployment, ...
research
02/02/2019

CodedPrivateML: A Fast and Privacy-Preserving Framework for Distributed Machine Learning

How to train a machine learning model while keeping the data private and...
research
02/15/2023

Big Little Transformer Decoder

The recent emergence of Large Language Models based on the Transformer a...
research
09/04/2022

Joint Linear and Nonlinear Computation across Functions for Efficient Privacy-Preserving Neural Network Inference

While it is encouraging to witness the recent development in privacy-pre...

Please sign up or login with your details

Forgot password? Click here to reset