We present a conceptually simple, efficient, and general framework for
l...
Instruction tuning large language model (LLM) on image-text pairs has
ac...
Object detection has been expanded from a limited number of categories t...
Multi-object tracking (MOT) aims at estimating bounding boxes and identi...
Existing object detection methods are bounded in a fixed-set vocabulary ...
We propose DiffusionDet, a new framework that formulates object detectio...
We present a unified method, termed Unicorn, that can simultaneously sol...
Referring video object segmentation (R-VOS) is an emerging cross-modal t...
A typical pipeline for multi-object tracking (MOT) is to use a detector ...
Multi-object tracking (MOT) aims at estimating bounding boxes and identi...
A more realistic object detection paradigm, Open-World Object Detection,...
Temporal Action Detection (TAD) is an essential and challenging topic in...
This work presents a new fine-grained transparent object segmentation
da...
Multiple-object tracking(MOT) is mostly dominated by complex and multi-s...
End-to-end one-stage object detection trailed thus far. This paper disco...
We present Sparse R-CNN, a purely sparse method for object detection in
...
In this paper, we introduce an anchor-box free and single shot instance
...
Detecting human in a crowd is a challenging problem due to the uncertain...
Scene text recognition has witnessed rapid development with the advance ...