Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Active Tasks

12/31/2018
by   Alexander Sax, et al.
8

One of the ultimate promises of computer vision is to help robotic agents perform active tasks, like delivering packages or doing household chores. However, the conventional approach to solving "vision" is to define a set of offline recognition problems (e.g. object detection) and solve those first. This approach faces a challenge from the recent rise of Deep Reinforcement Learning frameworks that learn active tasks from scratch using images as input. This poses a set of fundamental questions: what is the role of computer vision if everything can be learned from scratch? Could intermediate vision tasks actually be useful for performing arbitrary downstream active tasks? We show that proper use of mid-level perception confers significant advantages over training from scratch. We implement a perception module as a set of mid-level visual representations and demonstrate that learning active tasks with mid-level features is significantly more sample-efficient than scratch and able to generalize in situations where the from-scratch approach fails. However, we show that realizing these gains requires careful selection of the particular mid-level features for each downstream task. Finally, we put forth a simple and efficient perception module based on the results of our study, which can be adopted as a rather generic perception module for active frameworks.

READ FULL TEXT

page 2

page 4

page 6

page 7

page 9

page 10

page 11

page 12

research
12/23/2019

Learning to Navigate Using Mid-Level Visual Priors

How much does having visual priors about the world (e.g. the fact that t...
research
11/13/2020

Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation

Vision-based robotics often separates the control loop into one module f...
research
08/07/2023

FeatEnHancer: Enhancing Hierarchical Features for Object Detection and Beyond Under Low-Light Vision

Extracting useful visual cues for the downstream tasks is especially cha...
research
05/24/2020

JPAD-SE: High-Level Semantics for Joint Perception-Accuracy-Distortion Enhancement in Image Compression

While humans can effortlessly transform complex visual scenes into simpl...
research
11/13/2015

Deep Mean Maps

The use of distributions and high-level features from deep architecture ...
research
01/27/2021

Self-Calibrating Active Binocular Vision via Active Efficient Coding with Deep Autoencoders

We present a model of the self-calibration of active binocular vision co...
research
03/21/2023

BigSmall: Efficient Multi-Task Learning for Disparate Spatial and Temporal Physiological Measurements

Understanding of human visual perception has historically inspired the d...

Please sign up or login with your details

Forgot password? Click here to reset