Detective: An Attentive Recurrent Model for Sparse Object Detection

by   Amine Kechaou, et al.

In this work, we present Detective - an attentive object detector that identifies objects in images in a sequential manner. Our network is based on an encoder-decoder architecture, where the encoder is a convolutional neural network, and the decoder is a convolutional recurrent neural network coupled with an attention mechanism. At each iteration, our decoder focuses on the relevant parts of the image using an attention mechanism, and then estimates the object's class and the bounding box coordinates. Current object detection models generate dense predictions and rely on post-processing to remove duplicate predictions. Detective is a sparse object detector that generates a single bounding box per object instance. However, training a sparse object detector is challenging, as it requires the model to reason at the instance level and not just at the class and spatial levels. We propose a training mechanism based on the Hungarian algorithm and a loss that balances the localization and classification tasks. This allows Detective to achieve promising results on the PASCAL VOC object detection dataset. Our experiments demonstrate that sparse object detection is possible and has a great potential for future developments in applications where the order of the objects to be predicted is of interest.


page 1

page 8


Attentional Network for Visual Object Detection

We propose augmenting deep neural networks with an attention mechanism f...

Reverse-engineering Bar Charts Using Neural Networks

Reverse-engineering bar charts extracts textual and numeric information ...

Action-Driven Object Detection with Top-Down Visual Attentions

A dominant paradigm for deep learning based object detection relies on a...

Learning to detect and localize many objects from few examples

The current trend in object detection and localization is to learn predi...

Knowledge-based Recurrent Attentive Neural Network for Traffic Sign Detection

Accurate Traffic Sign Detection (TSD) can help drivers make better decis...

Revisiting the Onsets and Frames Model with Additive Attention

Recent advances in automatic music transcription (AMT) have achieved hig...

Attention-based Joint Detection of Object and Semantic Part

In this paper, we address the problem of joint detection of objects like...

Please sign up or login with your details

Forgot password? Click here to reset