Image recognition and generation have long been developed independently ...
Human driver can easily describe the complex traffic scene by visual sys...
The captivating realm of Minecraft has attracted substantial research
in...
Driving scenes are extremely diverse and complicated that it is impossib...
The task of 3D single object tracking (SOT) with LiDAR point clouds is
c...
Modern autonomous driving system is characterized as modular tasks in
se...
We present a novel bird's-eye-view (BEV) detector with perspective
super...
To effectively exploit the potential of large-scale models, various
pre-...
Recent success of vision transformers has inspired a series of vision
ba...
Compared to the great progress of large-scale vision transformers (ViTs)...
Transformer, as a strong and flexible architecture for modelling long-ra...
Video inpainting aims to fill the given spatiotemporal holes with realis...
DETR has been recently proposed to eliminate the need for many hand-desi...
This article introduces the solutions of the team lvisTraveler for LVIS
...
We introduce a new pre-trainable generic representation for visual-lingu...