BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation

by   Jifeng Dai, et al.

Recent leading approaches to semantic segmentation rely on deep convolutional networks trained with human-annotated, pixel-level segmentation masks. Such pixel-accurate supervision demands expensive labeling effort and limits the performance of deep networks that usually benefit from more training data. In this paper, we propose a method that achieves competitive accuracy but only requires easily obtained bounding box annotations. The basic idea is to iterate between automatically generating region proposals and training convolutional networks. These two steps gradually recover segmentation masks for improving the networks, and vise versa. Our method, called BoxSup, produces competitive results supervised by boxes only, on par with strong baselines fully supervised by masks under the same setting. By leveraging a large amount of bounding boxes, BoxSup further unleashes the power of deep convolutional networks and yields state-of-the-art results on PASCAL VOC 2012 and PASCAL-CONTEXT.


page 2

page 3

page 4

page 7

page 8


Box-driven Class-wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation

Semantic segmentation has achieved huge progress via adopting deep Fully...

Towards Closing the Gap in Weakly Supervised Semantic Segmentation with DCNNs: Combining Local and Global Models

Generating training sets for deep convolutional neural networks is a bot...

Learn to Segment Organs with a Few Bounding Boxes

Semantic segmentation is an import task in the medical field to identify...

Self-taught Object Localization with Deep Networks

This paper introduces self-taught object localization, a novel approach ...

Please sign up or login with your details

Forgot password? Click here to reset