Unsupervised Learning of Important Objects from First-Person Videos

11/16/2016
by   Gedas Bertasius, et al.
0

A first-person camera, placed at a person's head, captures, which objects are important to the camera wearer. Most prior methods for this task learn to detect such important objects from the manually labeled first-person data in a supervised fashion. However, important objects are strongly related to the camera wearer's internal state such as his intentions and attention, and thus, only the person wearing the camera can provide the importance labels. Such a constraint makes the annotation process costly and limited in scalability. In this work, we show that we can detect important objects in first-person images without the supervision by the camera wearer or even third-person labelers. We formulate an important detection problem as an interplay between the 1) segmentation and 2) recognition agents. The segmentation agent first proposes a possible important object segmentation mask for each image, and then feeds it to the recognition agent, which learns to predict an important object mask using visual semantics and spatial features. We implement such an interplay between both agents via an alternating cross-pathway supervision scheme inside our proposed Visual-Spatial Network (VSN). Our VSN consists of spatial ("where") and visual ("what") pathways, one of which learns common visual semantics while the other focuses on the spatial location cues. Our unsupervised learning is accomplished via a cross-pathway supervision, where one pathway feeds its predictions to a segmentation agent, which proposes a candidate important object segmentation mask that is then used by the other pathway as a supervisory signal. We show our method's success on two different important object datasets, where our method achieves similar or better results as the supervised methods.

READ FULL TEXT

page 1

page 3

page 5

page 7

research
03/15/2016

First Person Action-Object Detection with EgoNet

Unlike traditional third-person cameras mounted on robots, a first-perso...
research
09/05/2017

Using Cross-Model EgoSupervision to Learn Cooperative Basketball Intention

We present a first-person method for cooperative basketball intention pr...
research
12/11/2018

Grounded Human-Object Interaction Hotspots from Video

Learning how to interact with objects is an important step towards embod...
research
11/23/2020

Prior to Segment: Foreground Cues for Novel Objects in Partially Supervised Instance Segmentation

Instance segmentation methods require large datasets with expensive inst...
research
04/09/2019

Embodied Visual Recognition

Passive visual systems typically fail to recognize objects in the amodal...
research
11/20/2019

Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking

The ability to detect and track objects in the visual world is a crucial...
research
03/31/2017

Unsupervised learning from video to detect foreground objects in single images

Unsupervised learning from visual data is one of the most difficult chal...

Please sign up or login with your details

Forgot password? Click here to reset