ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization

01/18/2022
by   Hanwei Xu, et al.
0

We propose a multitask pretraining approach ZeroPrompt for zero-shot generalization, focusing on task scaling and zero-shot prompting. While previous models are trained on only a few dozen tasks, we scale to 1,000 tasks for the first time using real-world data. This leads to a crucial discovery that task scaling can be an efficient alternative to model scaling; i.e., the model size has little impact on performance with an extremely large number of tasks. Our results show that task scaling can substantially improve training efficiency by 30 times in FLOPs. Moreover, we present a prompting method that incorporates a genetic algorithm to automatically search for the best prompt for unseen tasks, along with a few other improvements. Empirically, ZeroPrompt substantially improves both the efficiency and the performance of zero-shot learning across a variety of academic and production datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2023

Projected Subnetworks Scale Adaptation

Large models support great zero-shot and few-shot capabilities. However,...
research
11/15/2022

A Universal Discriminator for Zero-Shot Generalization

Generative modeling has been the dominant approach for large-scale pretr...
research
10/15/2021

Multitask Prompted Training Enables Zero-Shot Task Generalization

Large language models have recently been shown to attain reasonable zero...
research
11/19/2021

Combined Scaling for Zero-shot Transfer Learning

We present a combined scaling method called BASIC that achieves 85.7 zer...
research
09/07/2021

Patient Outcome and Zero-shot Diagnosis Prediction with Hypernetwork-guided Multitask Learning

Multitask deep learning has been applied to patient outcome prediction f...
research
10/10/2017

Using Task Descriptions in Lifelong Machine Learning for Improved Performance and Zero-Shot Transfer

Knowledge transfer between tasks can improve the performance of learned ...
research
05/01/2018

Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems

We propose two new methods to address the weak scaling problems of KRR: ...

Please sign up or login with your details

Forgot password? Click here to reset