Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images

06/23/2023
by   Tahira Shehzadi, et al.
0

This paper takes an important step in bridging the performance gap between DETR and R-CNN for graphical object detection. Existing graphical object detection approaches have enjoyed recent enhancements in CNN-based object detection methods, achieving remarkable progress. Recently, Transformer-based detectors have considerably boosted the generic object detection performance, eliminating the need for hand-crafted features or post-processing steps such as Non-Maximum Suppression (NMS) using object queries. However, the effectiveness of such enhanced transformer-based detection algorithms has yet to be verified for the problem of graphical object detection. Essentially, inspired by the latest advancements in the DETR, we employ the existing detection transformer with few modifications for graphical object detection. We modify object queries in different ways, using points, anchor boxes and adding positive and negative noise to the anchors to boost performance. These modifications allow for better handling of objects with varying sizes and aspect ratios, more robustness to small variations in object positions and sizes, and improved image discrimination between objects and non-objects. We evaluate our approach on the four graphical datasets: PubTables, TableBank, NTable and PubLaynet. Upon integrating query modifications in the DETR, we outperform prior works and achieve new state-of-the-art results with the mAP of 96.9%, 95.7% and 99.3% on TableBank, PubLaynet, PubTables, respectively. The results from extensive ablations show that transformer-based methods are more effective for document analysis analogous to other applications. We hope this study draws more attention to the research of using detection transformers in document image analysis.

READ FULL TEXT
research
08/30/2023

CircleFormer: Circular Nuclei Detection in Whole Slide Images with Circle Queries and Attention

Both CNN-based and Transformer-based object detection with bounding box ...
research
08/25/2020

Graphical Object Detection in Document Images

Graphical elements: particularly tables and figures contain a visual sum...
research
01/13/2022

TransVOD: End-to-end Video Object Detection with Spatial-Temporal Transformers

Detection Transformer (DETR) and Deformable DETR have been proposed to e...
research
09/10/2023

Transformers in Small Object Detection: A Benchmark and Survey of State-of-the-Art

Transformers have rapidly gained popularity in computer vision, especial...
research
09/16/2021

An End-to-End Transformer Model for 3D Object Detection

We propose 3DETR, an end-to-end Transformer based object detection model...
research
03/15/2023

FAQ: Feature Aggregated Queries for Transformer-based Video Object Detectors

Video object detection needs to solve feature degradation situations tha...
research
08/26/2020

Determinantal Point Process as an alternative to NMS

We present a determinantal point process (DPP) inspired alternative to n...

Please sign up or login with your details

Forgot password? Click here to reset