Yes, this Way! Learning to Ground Referring Expressions into Actions with Intra-episodic Feedback from Supportive Teachers

05/22/2023
by   Philipp Sadler, et al.
0

The ability to pick up on language signals in an ongoing interaction is crucial for future machine learning models to collaborate and interact with humans naturally. In this paper, we present an initial study that evaluates intra-episodic feedback given in a collaborative setting. We use a referential language game as a controllable example of a task-oriented collaborative joint activity. A teacher utters a referring expression generated by a well-known symbolic algorithm (the "Incremental Algorithm") as an initial instruction and then monitors the follower's actions to possibly intervene with intra-episodic feedback (which does not explicitly have to be requested). We frame this task as a reinforcement learning problem with sparse rewards and learn a follower policy for a heuristic teacher. Our results show that intra-episodic feedback allows the follower to generalize on aspects of scene complexity and performs better than providing only the initial statement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2021

Influencing Reinforcement Learning through Natural Language Guidance

Interactive reinforcement learning agents use human feedback or instruct...
research
11/12/2022

The Expertise Problem: Learning from Specialized Feedback

Reinforcement learning from human feedback (RLHF) is a powerful techniqu...
research
09/30/2020

Learning Rewards from Linguistic Feedback

We explore unconstrained natural language feedback as a learning signal ...
research
04/02/2021

Learning Online from Corrective Feedback: A Meta-Algorithm for Robotics

A key challenge in Imitation Learning (IL) is that optimal state actions...
research
12/19/2022

Continual Learning for Instruction Following from Realtime Feedback

We study the problem of continually training an instruction-following ag...
research
09/09/2021

Analysis of Language Change in Collaborative Instruction Following

We analyze language change over time in a collaborative, goal-oriented i...
research
02/07/2023

Learning Manner of Execution from Partial Corrections

Some actions must be executed in different ways depending on the context...

Please sign up or login with your details

Forgot password? Click here to reset