Log In Sign Up

Zero-shot Learning by Generating Task-specific Adapters

by   Qinyuan Ye, et al.

Pre-trained text-to-text transformers achieve impressive performance across a wide range of NLP tasks, and they naturally support zero-shot learning (ZSL) by using the task description as prompt in the input. However, this approach has potential limitations, as it learns from input-output pairs at instance level, instead of learning to solve tasks at task level. Alternatively, applying existing ZSL methods to text-to-text transformers is non-trivial due to their text generation objective and huge size. To address these issues, we introduce Hypter, a framework that improves zero-shot transferability by training a hypernetwork to generate task-specific adapters from task descriptions. This formulation enables learning at task level, and greatly reduces the number of parameters by using light-weight adapters. Experiments on two datasets demonstrate Hypter improves upon fine-tuning baselines.


page 1

page 2

page 3

page 4


Zero-Shot Information Extraction as a Unified Text-to-Triple Translation

We cast a suite of information extraction tasks into a text-to-triple tr...

ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback

Recently, dataset-generation-based zero-shot learning has shown promisin...

KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation

Data-to-text generation has recently attracted substantial interests due...

Text2Model: Model Induction for Zero-shot Generalization Using Task Descriptions

We study the problem of generating a training-free task-dependent visual...

Beyond prompting: Making Pre-trained Language Models Better Zero-shot Learners by Clustering Representations

Recent work has demonstrated that pre-trained language models (PLMs) are...

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

There is a growing interest in dataset generation recently due to the su...

A Universal Discriminator for Zero-Shot Generalization

Generative modeling has been the dominant approach for large-scale pretr...