Cross-domain Detection Transformer based on Spatial-aware and Semantic-aware Token Alignment

06/01/2022
by   Jinhong Deng, et al.
0

Detection transformers like DETR have recently shown promising performance on many object detection tasks, but the generalization ability of those methods is still quite challenging for cross-domain adaptation scenarios. To address the cross-domain issue, a straightforward way is to perform token alignment with adversarial training in transformers. However, its performance is often unsatisfactory as the tokens in detection transformers are quite diverse and represent different spatial and semantic information. In this paper, we propose a new method called Spatial-aware and Semantic-aware Token Alignment (SSTA) for cross-domain detection transformers. In particular, we take advantage of the characteristics of cross-attention as used in detection transformer and propose the spatial-aware token alignment (SpaTA) and the semantic-aware token alignment (SemTA) strategies to guide the token alignment across domains. For spatial-aware token alignment, we can extract the information from the cross-attention map (CAM) to align the distribution of tokens according to their attention to object queries. For semantic-aware token alignment, we inject the category information into the cross-attention map and construct domain embedding to guide the learning of a multi-class discriminator so as to model the category relationship and achieve category-level token alignment during the entire adaptation process. We conduct extensive experiments on several widely-used benchmarks, and the results clearly show the effectiveness of our proposed method over existing state-of-the-art baselines.

READ FULL TEXT
research
07/27/2021

Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

Detection transformers have recently shown promising object detection re...
research
12/06/2022

Semantic-aware Message Broadcasting for Efficient Unsupervised Domain Adaptation

Vision transformer has demonstrated great potential in abundant vision t...
research
11/27/2022

Exploring Consistency in Cross-Domain Transformer for Domain Adaptive Semantic Segmentation

While transformers have greatly boosted performance in semantic segmenta...
research
03/07/2021

MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection

Existing approaches for unsupervised domain adaptive object detection pe...
research
06/04/2022

Video-based Human-Object Interaction Detection from Tubelet Tokens

We present a novel vision Transformer, named TUTOR, which is able to lea...
research
10/12/2022

Token-Label Alignment for Vision Transformers

Data mixing strategies (e.g., CutMix) have shown the ability to greatly ...
research
11/29/2022

Soft Alignment Objectives for Robust Adaptation in Machine Translation

Domain adaptation allows generative language models to address specific ...

Please sign up or login with your details

Forgot password? Click here to reset