A General Purpose Supervisory Signal for Embodied Agents

12/01/2022
by   Kunal Pratap Singh, et al.
0

Training effective embodied AI agents often involves manual reward engineering, expert imitation, specialized components such as maps, or leveraging additional sensors for depth and localization. Another approach is to use neural architectures alongside self-supervised objectives which encourage better representation learning. In practice, there are few guarantees that these self-supervised objectives encode task-relevant information. We propose the Scene Graph Contrastive (SGC) loss, which uses scene graphs as general-purpose, training-only, supervisory signals. The SGC loss does away with explicit graph decoding and instead uses contrastive learning to align an agent's representation with a rich graphical encoding of its environment. The SGC loss is generally applicable, simple to implement, and encourages representations that encode objects' semantics, relationships, and history. Using the SGC loss, we attain significant gains on three embodied tasks: Object Navigation, Multi-Object Navigation, and Arm Point Navigation. Finally, we present studies and analyses which demonstrate the ability of our trained representation to encode semantic cues about the environment.

READ FULL TEXT

page 1

page 3

page 4

page 6

page 15

page 17

page 18

research
04/26/2022

Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation

Recent general-purpose audio representations show state-of-the-art perfo...
research
12/10/2021

Concept Representation Learning with Contrastive Self-Supervised Learning

Concept-oriented deep learning (CODL) is a general approach to meet the ...
research
02/03/2021

General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework

This paper presents a self-supervised learning framework, named MGF, for...
research
11/18/2019

Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks

Vision-Language Navigation (VLN) is a task where agents learn to navigat...
research
02/22/2023

Saliency Guided Contrastive Learning on Scene Images

Self-supervised learning holds promise in leveraging large numbers of un...
research
03/27/2018

Mittens: An Extension of GloVe for Learning Domain-Specialized Representations

We present a simple extension of the GloVe representation learning model...
research
06/08/2023

SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic Understanding

Semantic 2D maps are commonly used by humans and machines for navigation...

Please sign up or login with your details

Forgot password? Click here to reset