Modular Framework for Visuomotor Language Grounding

09/05/2021
by   Kolby Nottingham, et al.
14

Natural language instruction following tasks serve as a valuable test-bed for grounded language and robotics research. However, data collection for these tasks is expensive and end-to-end approaches suffer from data inefficiency. We propose the structuring of language, acting, and visual tasks into separate modules that can be trained independently. Using a Language, Action, and Vision (LAV) framework removes the dependence of action and vision modules on instruction following datasets, making them more efficient to train. We also present a preliminary evaluation of LAV on the ALFRED task for visual and interactive instruction following.

READ FULL TEXT

page 1

page 2

page 3

research
10/24/2020

Modularity Improves Out-of-Domain Instruction Following

We propose a modular architecture for following natural language instruc...
research
12/06/2020

MOCA: A Modular Object-Centric Approach for Interactive Instruction Following

Performing simple household tasks based on language directives is very n...
research
02/25/2022

SGL: Symbolic Goal Learning in a Hybrid, Modular Framework for Human Instruction Following

This paper investigates robot manipulation based on human instruction wi...
research
05/24/2022

Sim-To-Real Transfer of Visual Grounding for Human-Aided Ambiguity Resolution

Service robots should be able to interact naturally with non-expert huma...
research
01/16/2018

Grounded Language Understanding for Manipulation Instructions Using GAN-Based Classification

The target task of this study is grounded language understanding for dom...
research
09/09/2021

Analysis of Language Change in Collaborative Instruction Following

We analyze language change over time in a collaborative, goal-oriented i...

Please sign up or login with your details

Forgot password? Click here to reset