PUMA: Secure Inference of LLaMA-7B in Five Minutes

07/24/2023
by   Ye Dong, et al.
0

With ChatGPT as a representative, tons of companies have began to provide services based on large Transformers models. However, using such a service inevitably leak users' prompts to the model provider. Previous studies have studied secure inference for Transformer models using secure multiparty computation (MPC), where model parameters and clients' prompts are kept secret. Despite this, these frameworks are still limited in terms of model performance, efficiency, and deployment. To address these limitations, we propose framework PUMA to enable fast and secure Transformer model inference. Our framework designs high quality approximations for expensive functions, such as GeLU and Softmax, which significantly reduce the cost of secure inference while preserving the model performance. Additionally, we design secure Embedding and LayerNorm procedures that faithfully implement the desired functionality without undermining the Transformer architecture. PUMA is about 2x faster than the state-of-the-art MPC framework MPCFORMER(ICLR 2023) and has similar accuracy as plaintext models without fine-tuning (which the previous works failed to achieve). One more thing, PUMA can evaluate LLaMA-7B in around 5 minutes to generate 1 token. To our best knowledge, this is the first time that a model with such a parameter size is able to be evaluated under MPC. PUMA has been open-sourced in the Github repository of SecretFlow-SPU.

READ FULL TEXT
research
08/19/2023

East: Efficient and Accurate Secure Transformer Framework for Inference

Transformer has been successfully used in practical applications, such a...
research
11/25/2022

MPCViT: Searching for MPC-friendly Vision Transformer with Heterogeneous Attention

Secure multi-party computation (MPC) enables computation directly on enc...
research
09/27/2022

MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference

Multi-party computing (MPC) has been gaining popularity over the past ye...
research
09/09/2023

Compact: Approximating Complex Activation Functions for Secure Computation

Secure multi-party computation (MPC) techniques can be used to provide d...
research
07/01/2021

Secure Quantized Training for Deep Learning

We have implemented training of neural networks in secure multi-party co...
research
09/14/2022

SEEK: model extraction attack against hybrid secure inference protocols

Security concerns about a machine learning model used in a prediction-as...
research
10/28/2019

Secure Evaluation of Quantized Neural Networks

Image classification using Deep Neural Networks that preserve the privac...

Please sign up or login with your details

Forgot password? Click here to reset