Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)

09/16/2023
by   Parsa Kavehzadeh, et al.
0

The rapid advancement of large language models (LLMs) has revolutionized natural language processing (NLP). While these models excel at understanding and generating human-like text, their widespread deployment can be prohibitively expensive. SortedNet is a recent training technique for enabling dynamic inference for deep neural networks. It leverages network modularity to create sub-models with varying computational loads, sorting them based on computation/accuracy characteristics in a nested manner. We extend SortedNet to generative NLP tasks, making large language models dynamic without any pretraining and by only replacing standard Supervised Fine-Tuning (SFT) with Sorted Fine-Tuning (SoFT) at the same costs. Our approach boosts model efficiency, eliminating the need for multiple models for various scenarios during inference. We show that using this approach, we are able to unlock the potential of intermediate layers of transformers in generating the target output. Our sub-models remain integral components of the original model, minimizing storage requirements and transition costs between different computational/latency budgets. By applying this approach on LLaMa 2 13B for tuning on the Stanford Alpaca dataset and comparing it to normal tuning and early exit via PandaLM benchmark, we show that Sorted Fine-Tuning can deliver models twice as fast as the original model while maintaining or exceeding performance.

READ FULL TEXT

page 3

page 4

page 5

research
12/17/2022

HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation

Language models with the Transformers structure have shown great perform...
research
12/15/2021

Fine-Tuning Large Neural Language Models for Biomedical Natural Language Processing

Motivation: A perennial challenge for biomedical researchers and clinica...
research
06/20/2018

Doubly Nested Network for Resource-Efficient Inference

We propose doubly nested network(DNNet) where all neurons represent thei...
research
05/15/2022

Mitigating Toxic Degeneration with Empathetic Data: Exploring the Relationship Between Toxicity and Empathy

Large pre-trained neural language models have supported the effectivenes...
research
05/23/2023

Improving Language Models via Plug-and-Play Retrieval Feedback

Large language models (LLMs) exhibit remarkable performance across vario...
research
09/10/2019

What do Deep Networks Like to Read?

Recent research towards understanding neural networks probes models in a...

Please sign up or login with your details

Forgot password? Click here to reset