AO2-DETR: Arbitrary-Oriented Object Detection Transformer

by   Linhui Dai, et al.

Arbitrary-oriented object detection (AOOD) is a challenging task to detect objects in the wild with arbitrary orientations and cluttered arrangements. Existing approaches are mainly based on anchor-based boxes or dense points, which rely on complicated hand-designed processing steps and inductive bias, such as anchor generation, transformation, and non-maximum suppression reasoning. Recently, the emerging transformer-based approaches view object detection as a direct set prediction problem that effectively removes the need for hand-designed components and inductive biases. In this paper, we propose an Arbitrary-Oriented Object DEtection TRansformer framework, termed AO2-DETR, which comprises three dedicated components. More precisely, an oriented proposal generation mechanism is proposed to explicitly generate oriented proposals, which provides better positional priors for pooling features to modulate the cross-attention in the transformer decoder. An adaptive oriented proposal refinement module is introduced to extract rotation-invariant region features and eliminate the misalignment between region features and objects. And a rotation-aware set matching loss is used to ensure the one-to-one matching process for direct set prediction without duplicate predictions. Our method considerably simplifies the overall pipeline and presents a new AOOD paradigm. Comprehensive experiments on several challenging datasets show that our method achieves superior performance on the AOOD task.


page 1

page 4

page 5

page 10

page 11

page 12


Learning RoI Transformer for Detecting Oriented Objects in Aerial Images

Object detection in aerial images is an active yet challenging task in c...

ARS-DETR: Aspect Ratio Sensitive Oriented Object Detection with Transformer

Existing oriented object detection methods commonly use metric AP_50 to ...

End-to-End Object Detection with Transformers

We present a new method that views object detection as a direct set pred...

Automatic Detection of Rail Components via A Deep Convolutional Transformer Network

Automatic detection of rail track and its fasteners via using continuous...

RHINO: Rotated DETR with Dynamic Denoising via Hungarian Matching for Oriented Object Detection

With the publication of DINO, a variant of the Detection Transformer (DE...

Efficient Decoder-free Object Detection with Transformers

Vision transformers (ViTs) are changing the landscape of object detectio...

Rethinking Transformer-based Set Prediction for Object Detection

DETR is a recently proposed Transformer-based method which views object ...

Please sign up or login with your details

Forgot password? Click here to reset