Interpretable R-CNN

11/14/2017
by   Tianfu Wu, et al.
0

This paper presents a method of learning qualitatively interpretable models in object detection using popular two-stage region-based ConvNet detection systems (i.e., R-CNN). R-CNN consists of a region proposal network and a RoI (Region-of-Interest) prediction network.By interpretable models, we focus on weakly-supervised extractive rationale generation, that is learning to unfold latent discriminative part configurations of object instances automatically and simultaneously in detection without using any supervision for part configurations. We utilize a top-down hierarchical and compositional grammar model embedded in a directed acyclic AND-OR Graph (AOG) to explore and unfold the space of latent part configurations of RoIs. We propose an AOGParsing operator to substitute the RoIPooling operator widely used in R-CNN, so the proposed method is applicable to many state-of-the-art ConvNet based detection systems. The AOGParsing operator aims to harness both the explainable rigor of top-down hierarchical and compositional grammar models and the discriminative power of bottom-up deep neural networks through end-to-end training. In detection, a bounding box is interpreted by the best parse tree derived from the AOG on-the-fly, which is treated as the extractive rationale generated for interpreting detection. In learning, we propose a folding-unfolding method to train the AOG and ConvNet end-to-end. In experiments, we build on top of the R-FCN and test the proposed method on the PASCAL VOC 2007 and 2012 datasets with performance comparable to state-of-the-art methods.

READ FULL TEXT

page 2

page 7

page 9

research
09/11/2019

WSOD^2: Learning Bottom-up and Top-down Objectness Distillation for Weakly-supervised Object Detection

We study on weakly-supervised object detection (WSOD) which plays a vita...
research
05/22/2020

KL-Divergence-Based Region Proposal Network for Object Detection

The learning of the region proposal in object detection using the deep n...
research
11/15/2017

AOGNets: Deep AND-OR Grammar Networks for Visual Recognition

This paper presents a method of learning deep AND-OR Grammar (AOG) netwo...
research
11/28/2018

Deep Regionlets: Blended Representation and Deep Learning for Generic Object Detection

In this paper, we propose a novel object detection algorithm named "Deep...
research
07/08/2018

Auto-Context R-CNN

Region-based convolutional neural networks (R-CNN) fast_rcnn,faster_rcnn...
research
06/25/2014

Weakly-supervised Discovery of Visual Pattern Configurations

The increasing prominence of weakly labeled data nurtures a growing dema...
research
07/02/2014

Deep Poselets for Human Detection

We address the problem of detecting people in natural scenes using a par...

Please sign up or login with your details

Forgot password? Click here to reset