Semantic Visual Navigation by Watching YouTube Videos

06/17/2020
by   Matthew Chang, et al.
10

Semantic cues and statistical regularities in real-world environment layouts can improve efficiency for navigation in novel environments. This paper learns and leverages such semantic cues for navigating to objects of interest in novel environments, by simply watching YouTube videos. This is challenging because YouTube videos don't come with labels for actions or goals, and may not even showcase optimal behavior. Our proposed method tackles these challenges through the use of Q-learning on pseudo-labeled transition quadruples (image, action, next image, reward). We show that such off-policy Q-learning from passive data is able to learn meaningful semantic cues for navigation. These cues, when used in a hierarchical navigation policy, lead to improved efficiency at the ObjectGoal task in visually realistic simulations. We improve upon end-to-end RL methods by 66 will be made available.

READ FULL TEXT

page 2

page 4

page 9

page 19

page 20

page 21

page 22

page 23

research
05/29/2019

Learning Navigation Subroutines by Watching Videos

Hierarchies are an effective way to boost sample efficiency in reinforce...
research
05/15/2018

Visual Representations for Semantic Target Driven Navigation

What is a good visual representation for autonomous agents? We address t...
research
04/10/2023

Reinforcement Learning from Passive Data via Latent Intentions

Passive observational data, such as human videos, is abundant and rich i...
research
06/29/2021

Learning to Map for Active Semantic Goal Navigation

We consider the problem of object goal navigation in unseen environments...
research
07/22/2023

Learning Vision-and-Language Navigation from YouTube Videos

Vision-and-language navigation (VLN) requires an embodied agent to navig...
research
10/16/2018

Composable Action-Conditioned Predictors: Flexible Off-Policy Learning for Robot Navigation

A general-purpose intelligent robot must be able to learn autonomously a...
research
06/28/2015

Unsupervised Semantic Parsing of Video Collections

Human communication typically has an underlying structure. This is refle...

Please sign up or login with your details

Forgot password? Click here to reset