Continual Learning for Instruction Following from Realtime Feedback

12/19/2022
by   Alane Suhr, et al.
0

We study the problem of continually training an instruction-following agent through feedback provided by users during collaborative interactions. During interaction, human users instruct an agent using natural language, and provide realtime binary feedback as they observe the agent's instruction execution. We cast learning as a contextual bandit problem, converting the user feedback to immediate reward. We evaluate through multiple rounds of human-agent interactions, demonstrating 15.4 over time. We also show our approach is robust to several design variations, and that the feedback signal is roughly equivalent to the learning signal of supervised demonstration data.

READ FULL TEXT

page 5

page 14

research
08/10/2021

Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior

We study continual learning for natural language instruction generation,...
research
03/14/2023

CB2: Collaborative Natural Language Interaction Research Platform

CB2 is a multi-agent platform to study collaborative natural language in...
research
06/01/2017

Teaching Machines to Describe Images via Natural Language Feedback

Robots will eventually be part of every household. It is thus critical t...
research
09/19/2023

MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback

To solve complex tasks, large language models (LLMs) often require multi...
research
10/21/2019

Self-Educated Language Agent With Hindsight Experience Replay For Instruction Following

Language creates a compact representation of the world and allows the de...
research
05/22/2023

Yes, this Way! Learning to Ground Referring Expressions into Actions with Intra-episodic Feedback from Supportive Teachers

The ability to pick up on language signals in an ongoing interaction is ...
research
03/18/2022

Simulating Bandit Learning from User Feedback for Extractive Question Answering

We study learning from user feedback for extractive question answering b...

Please sign up or login with your details

Forgot password? Click here to reset