Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following

04/07/2023
by   Mingyu Ding, et al.
0

Humans, even at a very early age, can learn visual concepts and understand geometry and layout through active interaction with the environment, and generalize their compositions to complete tasks described by natural languages in novel scenes. To mimic such capability, we propose Embodied Concept Learner (ECL) in an interactive 3D environment. Specifically, a robot agent can ground visual concepts, build semantic maps and plan actions to complete tasks by learning purely from human demonstrations and language instructions, without access to ground-truth semantic and depth supervisions from simulations. ECL consists of: (i) an instruction parser that translates the natural languages into executable programs; (ii) an embodied concept learner that grounds visual concepts based on language descriptions; (iii) a map constructor that estimates depth and constructs semantic maps by leveraging the learned concepts; and (iv) a program executor with deterministic policies to execute each program. ECL has several appealing benefits thanks to its modularized design. Firstly, it enables the robotic agent to learn semantics and depth unsupervisedly acting like babies, e.g., ground concepts through active interaction and perceive depth by disparities when moving forward. Secondly, ECL is fully transparent and step-by-step interpretable in long-term planning. Thirdly, ECL could be beneficial for the embodied instruction following (EIF), outperforming previous works on the ALFRED benchmark when the semantic label is not provided. Also, the learned concept can be reused for other downstream tasks, such as reasoning of object states. Project page: http://ecl.csail.mit.edu/

READ FULL TEXT

page 2

page 3

page 7

page 8

page 14

page 17

research
07/23/2019

Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following

We consider the problem of learning to map from natural language instruc...
research
09/26/2022

Overcoming Referential Ambiguity in Language-Guided Goal-Conditioned Reinforcement Learning

Teaching an agent to perform new tasks using natural language can easily...
research
07/12/2021

A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution

Natural language provides an accessible and expressive interface to spec...
research
02/03/2021

Fast Concept Mapping: The Emergence of Human Abilities in Artificial Neural Networks when Learning Embodied and Self-Supervised

Most artificial neural networks used for object detection and recognitio...
research
04/27/2018

Reward Learning from Narrated Demonstrations

Humans effortlessly "program" one another by communicating goals and des...
research
01/25/2020

Following Instructions by Imagining and Reaching Visual Goals

While traditional methods for instruction-following typically assume pri...
research
07/31/2019

Disentangled Relational Representations for Explaining and Learning from Demonstration

Learning from demonstration is an effective method for human users to in...

Please sign up or login with your details

Forgot password? Click here to reset