Footprints and Free Space from a Single Color Image

04/14/2020
by   Jamie Watson, et al.
21

Understanding the shape of a scene from a single color image is a formidable computer vision task. However, most methods aim to predict the geometry of surfaces that are visible to the camera, which is of limited use when planning paths for robots or augmented reality agents. Such agents can only move when grounded on a traversable surface, which we define as the set of classes which humans can also walk over, such as grass, footpaths and pavement. Models which predict beyond the line of sight often parameterize the scene with voxels or meshes, which can be expensive to use in machine learning frameworks. We introduce a model to predict the geometry of both visible and occluded traversable surfaces, given a single RGB image as input. We learn from stereo video sequences, using camera poses, per-frame depth and semantic segmentation to form training data, which is used to supervise an image-to-image network. We train models from the KITTI driving dataset, the indoor Matterport dataset, and from our own casually captured stereo footage. We find that a surprisingly low bar for spatial coverage of training scenes is required. We validate our algorithm against a range of strong baselines, and include an assessment of our predictions for a path-planning task.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 7

page 8

research
12/11/2020

A Dark Flash Normal Camera

Casual photography is often performed in uncontrolled lighting that can ...
research
03/13/2019

Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments

Affordance modeling plays an important role in visual understanding. In ...
research
11/03/2021

Panoptic 3D Scene Reconstruction From a Single RGB Image

Understanding 3D scenes from a single image is fundamental to a wide var...
research
06/15/2023

Neural World Models for Computer Vision

Humans navigate in their environment by learning a mental model of the w...
research
03/29/2019

FrameNet: Learning Local Canonical Frames of 3D Surfaces from a Single RGB Image

In this work, we introduce the novel problem of identifying dense canoni...
research
03/23/2022

NeuMan: Neural Human Radiance Field from a Single Video

Photorealistic rendering and reposing of humans is important for enablin...
research
07/17/2010

A Machine Learning Approach to Recovery of Scene Geometry from Images

Recovering the 3D structure of the scene from images yields useful infor...

Please sign up or login with your details

Forgot password? Click here to reset