Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language Models

12/20/2022
by   Jingjing Xu, et al.
0

With increasing scale, large language models demonstrate both quantitative improvement and new qualitative capabilities, especially as zero-shot learners, like GPT-3. However, these results rely heavily on delicate prompt design and large computation. In this work, we explore whether the strong zero-shot ability could be achieved at a smaller model scale without any external supervised data. To achieve this goal, we revisit masked language modeling and present a geometry-guided self-supervised learning method (Go-tuningfor short) by taking a small number of task-aware self-supervised data to update language models further. Experiments show that Go-tuning can enable T5-small (80M) competitive zero-shot results compared with large language models, such as T5-XL (3B). We also apply Go-tuning on multi-task settings and develop a multi-task model, mgo-T5 (250M). It can reach the average performance of OPT (175B) on 9 datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2021

Towards Zero-shot Commonsense Reasoning with Self-supervised Refinement of Language Models

Can we get existing language models and refine them for zero-shot common...
research
02/09/2023

Toolformer: Language Models Can Teach Themselves to Use Tools

Language models (LMs) exhibit remarkable abilities to solve new tasks fr...
research
10/02/2020

MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale

We study the zero-shot transfer capabilities of text matching models on ...
research
06/08/2023

ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases

Enabling large language models to effectively utilize real-world tools i...
research
05/19/2023

LLM-Pruner: On the Structural Pruning of Large Language Models

Large language models (LLMs) have shown remarkable capabilities in langu...
research
02/10/2022

Distilling Hypernymy Relations from Language Models: On the Effectiveness of Zero-Shot Taxonomy Induction

In this paper, we analyze zero-shot taxonomy learning methods which are ...
research
06/13/2023

NoCoLA: The Norwegian Corpus of Linguistic Acceptability

While there has been a surge of large language models for Norwegian in r...

Please sign up or login with your details

Forgot password? Click here to reset