Petals: Collaborative Inference and Fine-tuning of Large Models

09/02/2022
by   Alexander Borzunov, et al.
0

Many NLP tasks benefit from using large language models (LLMs) that often have more than 100 billion parameters. With the release of BLOOM-176B and OPT-175B, everyone can download pretrained models of this scale. Still, using these models requires high-end hardware unavailable to many researchers. In some cases, LLMs can be used more affordably via RAM offloading or hosted APIs. However, these techniques have innate limitations: offloading is too slow for interactive inference, while APIs are not flexible enough for research. In this work, we propose Petals - a system for inference and fine-tuning of large models collaboratively by joining the resources of multiple parties trusted to process client's data. We demonstrate that this strategy significantly outperforms offloading for very large models, running inference of BLOOM-176B on consumer GPUs with ≈ 1 step per second. Unlike most inference APIs, Petals also natively exposes the hidden states of served models, allowing its users to train and share custom model extensions based on efficient fine-tuning methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2023

Distill or Annotate? Cost-Efficient Fine-Tuning of Compact Models

Fine-tuning large models is highly effective, however, inference using t...
research
05/27/2023

Fine-Tuning Language Models with Just Forward Passes

Fine-tuning language models (LMs) has yielded success on diverse downstr...
research
06/16/2023

Full Parameter Fine-tuning for Large Language Models with Limited Resources

Large Language Models (LLMs) have revolutionized Natural Language Proces...
research
04/03/2022

PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models

Current methods for few-shot fine-tuning of pretrained masked language m...
research
11/16/2022

Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed Representations

Due to the huge amount of parameters, fine-tuning of pretrained language...
research
09/16/2023

Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF

During the last stage of RLHF, a large language model is aligned to huma...
research
08/31/2021

T3-Vis: a visual analytic framework for Training and fine-Tuning Transformers in NLP

Transformers are the dominant architecture in NLP, but their training an...

Please sign up or login with your details

Forgot password? Click here to reset