Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners

07/27/2023
by   Jihyeon Lee, et al.
0

In-context learning, which offers substantial advantages over fine-tuning, is predominantly observed in decoder-only models, while encoder-decoder (i.e., seq2seq) models excel in methods that rely on weight updates. Recently, a few studies have demonstrated the feasibility of few-shot learning with seq2seq models; however, this has been limited to tasks that align well with the seq2seq architecture, such as summarization and translation. Inspired by these initial studies, we provide a first-ever extensive experiment comparing the in-context few-shot learning capabilities of decoder-only and encoder-decoder models on a broad range of tasks. Furthermore, we propose two methods to more effectively elicit in-context learning ability in seq2seq models: objective-aligned prompting and a fusion-based approach. Remarkably, our approach outperforms a decoder-only model that is six times larger and exhibits significant performance improvements compared to conventional seq2seq models across a variety of settings. We posit that, with the right configuration and prompt design, seq2seq models can be highly effective few-shot learners for a wide spectrum of applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2023

Revisiting Automated Prompting: Are We Actually Doing Better?

Current literature demonstrates that Large Language Models (LLMs) are gr...
research
08/02/2022

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

In this work, we demonstrate that multilingual large-scale sequence-to-s...
research
08/30/2023

MerA: Merging Pretrained Adapters For Few-Shot Learning

Adapter tuning, which updates only a few parameters, has become a mainst...
research
04/29/2021

Entailment as Few-Shot Learner

Large pre-trained language models (LMs) have demonstrated remarkable abi...
research
02/20/2020

An empirical study of Conv-TasNet

Conv-TasNet is a recently proposed waveform-based deep neural network th...
research
07/04/2016

Towards Abstraction from Extraction: Multiple Timescale Gated Recurrent Unit for Summarization

In this work, we introduce temporal hierarchies to the sequence to seque...
research
03/08/2021

Few-Shot Learning of an Interleaved Text Summarization Model by Pretraining with Synthetic Data

Interleaved texts, where posts belonging to different threads occur in a...

Please sign up or login with your details

Forgot password? Click here to reset