Concept-aware Training Improves In-context Learning Ability of Language Models

05/23/2023
by   Michal Štefánik, et al.
0

Many recent language models (LMs) of Transformers family exhibit so-called in-context learning (ICL) ability, manifested in the LMs' ability to modulate their function by a task described in a natural language input. Previous work curating these models assumes that ICL emerges from vast over-parametrization or the scale of multi-task training. However, a complementary branch of recent theoretical work attributes ICL emergence to specific properties of training data and creates functional in-context learners in small-scale, synthetic settings. Inspired by recent findings on data properties driving the emergence of ICL, we propose a method to create LMs able to better utilize the in-context information, by constructing training scenarios where it is beneficial for the LM to capture the analogical reasoning concepts. We measure that data sampling of Concept-aware Training (CoAT) consistently improves models' reasoning ability. As a result, the in-context learners trained with CoAT on only two datasets of a single (QA) task perform comparably to larger models trained on 1600+ tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

Teaching Probabilistic Logical Reasoning to Transformers

Recent research on transformer-based language models investigates their ...
research
12/03/2022

What is Not in the Context? Evaluation of Few-shot Learners with Informative Demonstrations

Large language models demonstrate an emergent ability to learn a new tas...
research
09/01/2023

Taken out of context: On measuring situational awareness in LLMs

We aim to better understand the emergence of `situational awareness' in ...
research
04/04/2023

Resources and Few-shot Learners for In-context Learning in Slavic Languages

Despite the rapid recent progress in creating accurate and compact in-co...
research
12/22/2022

Efficient Induction of Language Models Via Probabilistic Concept Formation

This paper presents a novel approach to the acquisition of language mode...
research
05/31/2023

Examining the Emergence of Deductive Reasoning in Generative Language Models

We conduct a preliminary inquiry into the ability of generative transfor...
research
05/26/2023

Large Language Models Can be Lazy Learners: Analyze Shortcuts in In-Context Learning

Large language models (LLMs) have recently shown great potential for in-...

Please sign up or login with your details

Forgot password? Click here to reset