Language-aware Multiple Datasets Detection Pretraining for DETRs

04/07/2023
by   Jing Hao, et al.
0

Pretraining on large-scale datasets can boost the performance of object detectors while the annotated datasets for object detection are hard to scale up due to the high labor cost. What we possess are numerous isolated filed-specific datasets, thus, it is appealing to jointly pretrain models across aggregation of datasets to enhance data volume and diversity. In this paper, we propose a strong framework for utilizing Multiple datasets to pretrain DETR-like detectors, termed METR, without the need for manual label spaces integration. It converts the typical multi-classification in object detection into binary classification by introducing a pre-trained language model. Specifically, we design a category extraction module for extracting potential categories involved in an image and assign these categories into different queries by language embeddings. Each query is only responsible for predicting a class-specific object. Besides, to adapt our novel detection paradigm, we propose a group bipartite matching strategy that limits the ground truths to match queries assigned to the same category. Extensive experiments demonstrate that METR achieves extraordinary results on either multi-task joint training or the pretrain finetune paradigm. Notably, our pre-trained models have high flexible transferability and increase the performance upon various DETR-like detectors on COCO val2017 benchmark. Codes will be available after this paper is published.

READ FULL TEXT

page 3

page 8

page 13

research
04/08/2022

Semantic Representation and Dependency Learning for Multi-Label Image Recognition

Recently many multi-label image recognition (MLR) works have made signif...
research
09/25/2018

Object Detection from Scratch with Deep Supervision

We propose Deeply Supervised Object Detectors (DSOD), an object detectio...
research
06/07/2022

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding

Leveraging large-scale data can introduce performance gains on many comp...
research
08/03/2017

DSOD: Learning Deeply Supervised Object Detectors from Scratch

We present Deeply Supervised Object Detector (DSOD), a framework that ca...
research
05/28/2019

An Analysis of Object Embeddings for Image Retrieval

We present an analysis of embeddings extracted from different pre-traine...
research
03/23/2023

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

Open-vocabulary detection (OVD) is an object detection task aiming at de...
research
08/04/2015

Online Domain Adaptation for Multi-Object Tracking

Automatically detecting, labeling, and tracking objects in videos depend...

Please sign up or login with your details

Forgot password? Click here to reset