Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers

05/21/2023
by   Linyuan Gong, et al.
0

This paper explores the effectiveness of model-generated signals in improving zero-shot generalization of text-to-text Transformers such as T5. We study various designs to pretrain T5 using an auxiliary model to construct more challenging token replacements for the main model to denoise. Key aspects under study include the decoding target, the location of the RTD head, and the masking pattern. Based on these studies, we develop a new model, METRO-T0, which is pretrained using the redesigned ELECTRA-Style pretraining strategies and then prompt-finetuned on a mixture of NLP tasks. METRO-T0 outperforms all similar-sized baselines on prompted NLP benchmarks, such as T0 Eval and MMLU, and rivals the state-of-the-art T0-11B model with only 8 Our analysis on model's neural activation and parameter sensitivity reveals that the effectiveness of METRO-T0 stems from more balanced contribution of parameters and better utilization of their capacity. The code and model checkpoints are available at https://github.com/gonglinyuan/metro_t0.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2023

Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images

Contrastive visual language pretraining has emerged as a powerful method...
research
08/22/2019

Improving Few-shot Text Classification via Pretrained Language Representations

Text classification tends to be difficult when the data is deficient or ...
research
11/15/2022

A Universal Discriminator for Zero-Shot Generalization

Generative modeling has been the dominant approach for large-scale pretr...
research
07/07/2021

Can Transformer Models Measure Coherence In Text? Re-Thinking the Shuffle Test

The Shuffle Test is the most common task to evaluate whether NLP models ...
research
01/02/2021

Zero-shot Learning by Generating Task-specific Adapters

Pre-trained text-to-text transformers achieve impressive performance acr...
research
03/07/2022

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Hyperparameter (HP) tuning in deep learning is an expensive process, pro...
research
04/07/2022

Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

We present a new framework AMOS that pretrains text encoders with an Adv...

Please sign up or login with your details

Forgot password? Click here to reset