Learning to Segment Human by Watching YouTube

10/04/2017
by   Xiaodan Liang, et al.
0

An intuition on human segmentation is that when a human is moving in a video, the video-context (e.g., appearance and motion clues) may potentially infer reasonable mask information for the whole human body. Inspired by this, based on popular deep convolutional neural networks (CNN), we explore a very-weakly supervised learning framework for human segmentation task, where only an imperfect human detector is available along with massive weakly-labeled YouTube videos. In our solution, the video-context guided human mask inference and CNN based segmentation network learning iterate to mutually enhance each other until no further improvement gains. In the first step, each video is decomposed into supervoxels by the unsupervised video segmentation. The superpixels within the supervoxels are then classified as human or non-human by graph optimization with unary energies from the imperfect human detection results and the predicted confidence maps by the CNN trained in the previous iteration. In the second step, the video-context derived human masks are used as direct labels to train CNN. Extensive experiments on the challenging PASCAL VOC 2012 semantic segmentation benchmark demonstrate that the proposed framework has already achieved superior results than all previous weakly-supervised methods with object class or bounding box annotations. In addition, by augmenting with the annotated masks from PASCAL VOC 2012, our method reaches a new state-of-the-art performance on the human segmentation task.

READ FULL TEXT

page 3

page 7

research
03/23/2016

Weakly-Supervised Semantic Segmentation using Motion Cues

Fully convolutional neural networks (FCNNs) trained on a large number of...
research
09/30/2022

Dual Progressive Transformations for Weakly Supervised Semantic Segmentation

Weakly supervised semantic segmentation (WSSS), which aims to mine the o...
research
06/08/2021

Affinity Attention Graph Neural Network for Weakly Supervised Semantic Segmentation

Weakly supervised semantic segmentation is receiving great attention due...
research
12/15/2020

Robots Understanding Contextual Information in Human-Centered Environments using Weakly Supervised Mask Data Distillation

Contextual information in human environments, such as signs, symbols, an...
research
07/02/2014

Deep Poselets for Human Detection

We address the problem of detecting people in natural scenes using a par...
research
07/30/2019

Weakly Supervised Body Part Parsing with Pose based Part Priors

Human body part parsing refers to the task of predicting the semantic se...
research
11/20/2020

Recovering the Imperfect: Cell Segmentation in the Presence of Dynamically Localized Proteins

Deploying off-the-shelf segmentation networks on biomedical data has bec...

Please sign up or login with your details

Forgot password? Click here to reset