Million-scale Object Detection with Large Vision Model

12/19/2022
by   Feng Lin, et al.
0

Over the past few years, developing a broad, universal, and general-purpose computer vision system has become a hot topic. A powerful universal system would be capable of solving diverse vision tasks simultaneously without being restricted to a specific problem or a specific data domain, which is of great importance in practical real-world computer vision applications. This study pushes the direction forward by concentrating on the million-scale multi-domain universal object detection problem. The problem is not trivial due to its complicated nature in terms of cross-dataset category label duplication, label conflicts, and the hierarchical taxonomy handling. Moreover, what is the resource-efficient way to utilize emerging large pre-trained vision models for million-scale cross-dataset object detection remains an open challenge. This paper tries to address these challenges by introducing our practices in label handling, hierarchy-aware loss design and resource-efficient model training with a pre-trained large model. Our method is ranked second in the object detection track of Robust Vision Challenge 2022 (RVC 2022). We hope our detailed study would serve as an alternative practice paradigm for similar problems in the community. The code is available at https://github.com/linfeng93/Large-UniDet.

READ FULL TEXT

page 15

page 16

page 17

research
06/20/2022

KOLOMVERSE: KRISO open large-scale image dataset for object detection in the maritime universe

Over the years, datasets have been developed for various object detectio...
research
08/02/2023

MDT3D: Multi-Dataset Training for LiDAR 3D Object Detection Generalization

Supervised 3D Object Detection models have been displaying increasingly ...
research
05/29/2023

Explicit Visual Prompting for Universal Foreground Segmentations

Foreground segmentation is a fundamental problem in computer vision, whi...
research
06/01/2021

You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

Can Transformer perform 2D object-level recognition from a pure sequence...
research
03/15/2022

Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy

Large-scale datasets play a vital role in computer vision. Existing data...
research
01/13/2023

CLIP the Gap: A Single Domain Generalization Approach for Object Detection

Single Domain Generalization (SDG) tackles the problem of training a mod...
research
02/25/2021

Simple multi-dataset detection

How do we build a general and broad object detection system? We use all ...

Please sign up or login with your details

Forgot password? Click here to reset