DeepAI AI Chat
Log In Sign Up

Performance-Efficiency Trade-Offs in Adapting Language Models to Text Classification Tasks

by   Laura Aina, et al.

Pre-trained language models (LMs) obtain state-of-the-art performance when adapted to text classification tasks. However, when using such models in real-world applications, efficiency considerations are paramount. In this paper, we study how different training procedures that adapt LMs to text classification perform, as we vary model and train set size. More specifically, we compare standard fine-tuning, prompting, and knowledge distillation (KD) when the teacher was trained with either fine-tuning or prompting. Our findings suggest that even though fine-tuning and prompting work well to train large LMs on large train sets, there are more efficient alternatives that can reduce compute or data cost. Interestingly, we find that prompting combined with KD can reduce compute and data cost at the same time.


page 8

page 9


Which Model Shall I Choose? Cost/Quality Trade-offs for Text Classification Tasks

Industry practitioners always face the problem of choosing the appropria...

Does QA-based intermediate training help fine-tuning language models for text classification?

Fine-tuning pre-trained language models for downstream tasks has become ...

CultureBERT: Fine-Tuning Transformer-Based Language Models for Corporate Culture

This paper introduces supervised machine learning to the literature meas...

GPT-D: Inducing Dementia-related Linguistic Anomalies by Deliberate Degradation of Artificial Neural Language Models

Deep learning (DL) techniques involving fine-tuning large numbers of mod...

Pre-train, Interact, Fine-tune: A Novel Interaction Representation for Text Classification

Text representation can aid machines in understanding text. Previous wor...

Prompt-Learning for Short Text Classification

In the short text, the extreme short length, feature sparsity and high a...

Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging

Knowledge Distillation (KD) is a commonly used technique for improving t...