Small on-device models have been successfully trained with user-level
di...
The rise of transformers in vision tasks not only advances network backb...
We propose Clustering Mask Transformer (CMT-DeepLab), a transformer-base...
In this paper, we tackle video panoptic segmentation, a task that requir...
Human keypoints are a well-studied representation of people.We explore h...
Matching one set of objects to another is a ubiquitous task in machine
l...