As the size of transformer-based models continues to grow, fine-tuning t...
We present CLUSTSEG, a general, transformer-based framework that tackles...
Multi-sensor fusion (MSF) is widely adopted for perception in autonomous...
Optical flow is an indispensable building block for various important
co...
Logic locking has been proposed to safeguard intellectual property (IP)
...
Monocular Depth Estimation (MDE) is a critical component in applications...
Prevalent state-of-the-art instance segmentation methods fall into a
que...
We devise deep nearest centroids (DNC), a conceptually elegant yet
surpr...
Facial pose estimation refers to the task of predicting face orientation...
Deep learning has substantially boosted the performance of Monocular Dep...
Video captioning is a challenging task as it needs to accurately transfo...
Network embedding is an effective technique to learn the low-dimensional...
Structure information extraction refers to the task of extracting struct...
Multi-object tracking and segmentation (MOTS) is a critical task for
aut...
Video objection detection is a challenging task because isolated video f...
Video instance segmentation (VIS) is a new and critical task in computer...
Geo-localization is a critical task in computer vision. In this work, we...
In this work, we introduce a Denser Feature Network (DenserNet) for visu...
This paper proposes to use the three vectors in a rotation matrix as the...
Vision and voice are two vital keys for agents' interaction and learning...
Accurate localization is a foundational capacity, required for autonomou...