Embodied Question Answering in Photorealistic Environments with Point Cloud Perception

04/06/2019
by   Erik Wijmans, et al.
12

To help bridge the gap between internet vision-style problems and the goal of vision for embodied perception we instantiate a large-scale navigation task -- Embodied Question Answering [1] in photo-realistic environments (Matterport 3D). We thoroughly study navigation policies that utilize 3D point clouds, RGB images, or their combination. Our analysis of these models reveals several key findings. We find that two seemingly naive navigation baselines, forward-only and random, are strong navigators and challenging to outperform, due to the specific choice of the evaluation setting presented by [1]. We find a novel loss-weighting scheme we call Inflection Weighting to be important when training recurrent models for navigation with behavior cloning and are able to out perform the baselines with this technique. We find that point clouds provide a richer signal than RGB images for learning obstacle avoidance, motivating the use (and continued study) of 3D deep learning models for embodied navigation.

READ FULL TEXT

page 2

page 4

page 6

page 7

page 9

page 11

page 13

page 14

research
01/29/2020

ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes

3D object detection has seen quick progress thanks to advances in deep l...
research
11/30/2017

Embodied Question Answering

We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- ...
research
03/29/2023

Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance Fields

Synthesizing photo-realistic images from a point cloud is challenging be...
research
12/30/2021

SE-MD: A Single-encoder multiple-decoder deep network for point cloud generation from 2D images

3D model generation from single 2D RGB images is a challenging and activ...
research
12/22/2019

Learning to Generate Dense Point Clouds with Textures on Multiple Categories

3D reconstruction from images is a core problem in computer vision. With...
research
10/25/2019

JRDB: A Dataset and Benchmark for Visual Perception for Navigation in Human Environments

We present JRDB, a novel dataset collected from our social mobile manipu...

Please sign up or login with your details

Forgot password? Click here to reset