Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks

10/01/2022
by   Zhenhailong Wang, et al.
5

Although large language models have achieved impressive zero-shot ability, the huge model size generally incurs high cost. Recently, semi-parametric language models, which augment a smaller language model with an external retriever, have demonstrated promising language modeling capabilities. However, it remains unclear whether such semi-parametric language models can perform competitively well as their fully-parametric counterparts on zero-shot generalization to downstream tasks. In this work, we introduce Zemi, a zero-shot semi-parametric language model. To our best knowledge, this is the first semi-parametric language model that can demonstrate strong zero-shot performance on a wide range of held-out unseen tasks. We train Zemi with a novel semi-parametric multitask prompted training paradigm, which shows significant improvement compared with the parametric multitask training as proposed by T0. Specifically, we augment the multitask training and zero-shot evaluation with retrieval from a large-scale task-agnostic unlabeled corpus. In order to incorporate multiple potentially noisy retrieved augmentations, we further propose a novel augmentation fusion module leveraging perceiver resampler and gated cross-attention. Notably, our proposed Zemi_LARGE outperforms T0-3B by 16 tasks while being 3.9x smaller in model size.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2021

Multitask Prompted Training Enables Zero-Shot Task Generalization

Large language models have recently been shown to attain reasonable zero...
research
10/28/2022

Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models

Fully-parametric language models generally require a huge number of mode...
research
03/16/2022

Geographic Adaptation of Pretrained Language Models

Geographic linguistic features are commonly used to improve the performa...
research
04/25/2023

Measuring Massive Multitask Chinese Understanding

The development of large-scale Chinese language models is flourishing, y...
research
02/02/2022

Pop Quiz! Can a Large Language Model Help With Reverse Engineering?

Large language models (such as OpenAI's Codex) have demonstrated impress...
research
04/27/2023

Large Language Models are Strong Zero-Shot Retriever

In this work, we propose a simple method that applies a large language m...
research
10/11/2022

Retrieval Augmentation for T5 Re-ranker using External Sources

Retrieval augmentation has shown promising improvements in different tas...

Please sign up or login with your details

Forgot password? Click here to reset