Impossible Triangle: What's Next for Pre-trained Language Models?

04/13/2022
by   Chenguang Zhu, et al.
0

Recent development of large-scale pre-trained language models (PLM) have significantly improved the capability of models in various NLP tasks, in terms of performance after task-specific fine-tuning and zero-shot / few-shot learning. However, many of such models come with a dauntingly huge size that few institutions can afford to pre-train, fine-tune or even deploy, while moderate-sized models usually lack strong generalized few-shot learning capabilities. In this paper, we first elaborate the current obstacles of using PLM models in terms of the Impossible Triangle: 1) moderate model size, 2) state-of-the-art few-shot learning capability, and 3) state-of-the-art fine-tuning capability. We argue that all existing PLM models lack one or more properties from the Impossible Triangle. To remedy these missing properties of PLMs, various techniques have been proposed, such as knowledge distillation, data augmentation and prompt learning, which inevitably brings additional work to the application of PLMs in real scenarios. We then offer insights into future research directions of PLMs to achieve the Impossible Triangle, and break down the task into several key phases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2021

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Pre-trained models have achieved state-of-the-art results in various Nat...
research
06/08/2023

Can AI Moderate Online Communities?

The task of cultivating healthy communication in online communities beco...
research
12/13/2022

Localized Latent Updates for Fine-Tuning Vision-Language Models

Although massive pre-trained vision-language models like CLIP show impre...
research
01/02/2021

Zero-shot Learning by Generating Task-specific Adapters

Pre-trained text-to-text transformers achieve impressive performance acr...
research
04/29/2021

Entailment as Few-Shot Learner

Large pre-trained language models (LMs) have demonstrated remarkable abi...
research
02/08/2023

CrossCodeBench: Benchmarking Cross-Task Generalization of Source Code Models

Despite the recent advances showing that a model pre-trained on large-sc...
research
04/18/2023

CancerGPT: Few-shot Drug Pair Synergy Prediction using Large Pre-trained Language Models

Large pre-trained language models (LLMs) have been shown to have signifi...

Please sign up or login with your details

Forgot password? Click here to reset