PGTRNet: Two-phase Weakly Supervised Object Detection with Pseudo Ground Truth Refining

08/25/2021
by   Jun Wang, et al.
1

Weakly Supervised Object Detection (WSOD), aiming to train detectors with only image-level annotations, has arisen increasing attention. Current state-of-the-art approaches mainly follow a two-stage training strategy whichintegrates a fully supervised detector (FSD) with a pure WSOD model. There are two main problems hindering the performance of the two-phase WSOD approaches, i.e., insufficient learning problem and strict reliance between the FSD and the pseudo ground truth (PGT) generated by theWSOD model. This paper proposes pseudo ground truth refinement network (PGTRNet), a simple yet effective method without introducing any extra learnable parameters, to cope with these problems. PGTRNet utilizes multiple bounding boxes to establish the PGT, mitigating the insufficient learning problem. Besides, we propose a novel online PGT refinement approach to steadily improve the quality of PGTby fully taking advantage of the power of FSD during the second-phase training, decoupling the first and second-phase models. Elaborate experiments are conducted on the PASCAL VOC 2007 benchmark to verify the effectiveness of our methods. Experimental results demonstrate that PGTRNet boosts the backbone model by 2.074 significant potentials of the second-phase training.

READ FULL TEXT
research
04/01/2021

Two-phase weakly supervised object detection with pseudo ground truth mining

Weakly Supervised Object Detection (WSOD), aiming to train detectors wit...
research
04/19/2016

Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection

The status quo approach to training object detectors requires expensive ...
research
02/20/2020

Learning Object Scale With Click Supervision for Object Detection

Weakly-supervised object detection has recently attracted increasing att...
research
03/30/2019

RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization

Video action detectors are usually trained using video datasets with ful...
research
08/27/2020

Moderately supervised learning: definition and framework

Supervised learning (SL) has achieved remarkable success in numerous art...
research
11/08/2021

Automated pharyngeal phase detection and bolus localization in videofluoroscopic swallowing study: Killing two birds with one stone?

The videofluoroscopic swallowing study (VFSS) is a gold-standard imaging...
research
10/14/2022

Injecting Domain Knowledge from Empirical Interatomic Potentials to Neural Networks for Predicting Material Properties

For decades, atomistic modeling has played a crucial role in predicting ...

Please sign up or login with your details

Forgot password? Click here to reset