Computer vision models have known performance disparities across attribu...
Generating a video given the first several static frames is challenging ...
This paper deals with the problem of localizing objects in image and vid...
We introduce Token Merging (ToMe), a simple method to increase the throu...
While transformers have begun to dominate many tasks in vision, applying...
The recently released Ego4D dataset and benchmark significantly scales a...
In this work, we present a new operator, called Instance Mask Projection...
Recently two-stage detectors have surged ahead of single-shot detectors ...
While state-of-the-art general object detectors are getting better and
b...
Sports channel video portals offer an exciting domain for research on
mu...
The main contribution of this paper is an approach for introducing addit...
For applications in navigation and robotics, estimating the 3D pose of
o...
We present a method for detecting objects in images using a single deep
...
This submission has been withdrawn by arXiv administrators because it is...