We present the All-Seeing (AS) project: a large-scale data and model for...
Image recognition and generation have long been developed independently ...
The captivating realm of Minecraft has attracted substantial research
in...
Large language models (LLMs) have notably accelerated progress towards
a...
We present an interactive visual framework named InternGPT, or iGPT for
...
Modern autonomous driving system is characterized as modular tasks in
se...
We present a novel bird's-eye-view (BEV) detector with perspective
super...
Despite the remarkable success of foundation models, their task-specific...
To effectively exploit the potential of large-scale models, various
pre-...
Recent success of vision transformers has inspired a series of vision
ba...
Compared to the great progress of large-scale vision transformers (ViTs)...
To build an artificial neural network like the biological intelligence
s...
Self-supervised learning (SSL) has delivered superior performance on a
v...
This paper proposes a simple baseline framework for video-based 2D/3D hu...
Self-supervised learning has shown its great potential to extract powerf...
Loss functions play an important role in training deep-network-based obj...
Biological intelligence systems of animals perceive the world by integra...
Deep learning-based models encounter challenges when processing long-tai...
As a fundamental problem for Artificial Intelligence, multi-agent system...
Significant progress has been achieved in automating the design of vario...
Despite the importance of unsupervised object detection, to the best of ...
We propose a general framework for searching surrogate losses for mainst...
DETR has been recently proposed to eliminate the need for many hand-desi...
In the feature maps of CNNs, there commonly exists considerable spatial
...
Convolutional networks are not aware of an object's geometric variations...
We introduce a new pre-trainable generic representation for visual-lingu...
Attention mechanisms have become a popular component in deep neural netw...
The superior performance of Deformable Convolutional Networks arises fro...
Accurate detection and tracking of objects is vital for effective video
...
Despite the recent success of video object detection on Desktop GPUs, it...
There has been significant progresses for image object detection in rece...
Extending state-of-the-art object detectors from image to video is
chall...
Deep convolutional neutral networks have achieved great success on image...
Although there has been a great deal of interest in analyzing customer
o...