In this paper, we present a novel method for mobile manipulators to perf...
Visual pre-training with large-scale real-world data has made great prog...
3D instance segmentation methods often require fully-annotated dense lab...
Recent advancements in Large Language Models (LLMs) such as GPT4 have
di...
We are witnessing significant progress on perception models, specificall...
Reference Expression Generation (REG) and Comprehension (REC) are two hi...
Empowering autonomous agents with 3D understanding for daily objects is ...
Masked autoencoders have become popular training paradigms for
self-supe...
It is desirable to enable robots capable of automatic assembly. Structur...
Can a robot autonomously learn to design and construct a bridge from
var...
Object Goal Navigation (ObjectNav) task is to navigate an agent to an ob...
The success of language Transformers is primarily attributed to the pret...
We present TWIST, a novel self-supervised representation learning method...
Separating 3D point clouds into individual instances is an important tas...
Autonomous assembly has been a desired functionality of many intelligent...
Grasping in cluttered scenes has always been a great challenge for robot...
Compared to many other dense prediction tasks, e.g., semantic segmentati...
It has been a challenge to learning skills for an agent from long-horizo...
We propose Scale-aware AutoAug to learn data augmentation policies for o...
Referring image segmentation aims to segment the objects referred by a
n...
Multiple-object tracking(MOT) is mostly dominated by complex and multi-s...
We present Sparse R-CNN, a purely sparse method for object detection in
...
To date, most existing self-supervised learning methods are designed and...
In this work, we aim at building a simple, direct, and fast instance
seg...
We present a new, embarrassingly simple approach to instance segmentatio...
Detecting actions in videos is an important yet challenging task. Previo...
Monocular depth estimation enables 3D perception from a single 2D image,...
Different functional areas of the human brain play different roles in br...
We present FoveaBox, an accurate, flexible and completely anchor-free
fr...
We present consistent optimization for single stage object detection.
Pr...
We propose a light-weight video frame interpolation algorithm. Our key
i...
State-of-the-art object detectors usually learn multi-scale representati...
As a new classification platform, deep learning has recently received
in...
We present RON, an efficient and effective framework for generic object
...
Almost all of the current top-performing object detection networks emplo...