Offline Visual Representation Learning for Embodied Navigation

04/27/2022
by   Karmesh Yadav, et al.
0

How should we learn visual representations for embodied agents that must see and move? The status quo is tabula rasa in vivo, i.e. learning visual representations from scratch while also learning to move, potentially augmented with auxiliary tasks (e.g. predicting the action taken between two successive observations). In this paper, we show that an alternative 2-stage strategy is far more effective: (1) offline pretraining of visual representations with self-supervised learning (SSL) using large-scale pre-rendered images of indoor environments (Omnidata), and (2) online finetuning of visuomotor representations on specific tasks with image augmentations under long learning schedules. We call this method Offline Visual Representation Learning (OVRL). We conduct large-scale experiments - on 3 different 3D datasets (Gibson, HM3D, MP3D), 2 tasks (ImageNav, ObjectNav), and 2 policy learning algorithms (RL, IL) - and find that the OVRL representations lead to significant across-the-board improvements in state of art, on ImageNav from 29.2 86 relative). Importantly, both results were achieved by the same visual encoder generalizing to datasets that were not seen during pretraining. While the benefits of pretraining sometimes diminish (or entirely disappear) with long finetuning schedules, we find that OVRL's performance gains continue to increase (not decrease) as the agent is trained for 2 billion frames of experience.

READ FULL TEXT
research
05/12/2021

When Does Contrastive Visual Representation Learning Work?

Recent self-supervised representation learning techniques have largely c...
research
08/12/2021

Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations

Large-scale pretraining of visual representations has led to state-of-th...
research
05/03/2021

Curious Representation Learning for Embodied Intelligence

Self-supervised representation learning has achieved remarkable success ...
research
02/11/2021

Representation Matters: Offline Pretraining for Sequential Decision Making

The recent success of supervised learning methods on ever larger offline...
research
06/06/2023

Vid2Act: Activate Offline Videos for Visual RL

Pretraining RL models on offline video datasets is a promising way to im...
research
07/09/2020

Auxiliary Tasks Speed Up Learning PointGoal Navigation

PointGoal Navigation is an embodied task that requires agents to navigat...
research
05/06/2021

Unsupervised Visual Representation Learning by Tracking Patches in Video

Inspired by the fact that human eyes continue to develop tracking abilit...

Please sign up or login with your details

Forgot password? Click here to reset