Few-shot Sequence Learning with Transformers

12/17/2020
by   Lajanugen Logeswaran, et al.
9

Few-shot algorithms aim at learning new tasks provided only a handful of training examples. In this work we investigate few-shot learning in the setting where the data points are sequences of tokens and propose an efficient learning algorithm based on Transformers. In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that the embedding of this token can be optimized on the fly given few labeled examples. Our approach does not require complicated changes to the model architecture such as adapter layers nor computing second order derivatives as is currently popular in the meta-learning and few-shot learning literature. We demonstrate our approach on a variety of tasks, and analyze the generalization properties of several model variants and baseline approaches. In particular, we show that compositional task descriptors can improve performance. Experiments show that our approach works at least as well as other methods, while being more computationally efficient.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2021

Reordering Examples Helps during Priming-based Few-Shot Learning

The ability to learn from limited data, or few-shot learning, is a desir...
research
07/01/2021

Few-Shot Learning with a Strong Teacher

Few-shot learning (FSL) aims to train a strong classifier using limited ...
research
03/14/2022

Self-Promoted Supervision for Few-Shot Transformer

The few-shot learning ability of vision transformers (ViTs) is rarely in...
research
03/21/2022

HyperShot: Few-Shot Learning by Kernel HyperNetworks

Few-shot models aim at making predictions using a minimal number of labe...
research
05/02/2023

Accelerating Neural Self-Improvement via Bootstrapping

Few-shot learning with sequence-processing neural networks (NNs) has rec...
research
11/17/2022

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

Large-scale transformer models have become the de-facto architectures fo...
research
03/17/2022

Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning

This paper presents new hierarchically cascaded transformers that can im...

Please sign up or login with your details

Forgot password? Click here to reset