A central goal of visual recognition is to understand objects and scenes...
While today's video recognition systems parse snapshots or short clips
a...
The "Roaring 20s" of visual recognition began with the introduction of V...
We present Masked Feature Prediction (MaskFeat) for self-supervised
pre-...
In this paper, we study Multiscale Vision Transformers (MViT) as a unifi...
Our world offers a never-ending stream of visual stimuli, yet today's vi...
Deep learning is slowly, but steadily, hitting a memory bottleneck. Whil...
We introduce a simple and efficient lossless image compression algorithm...
Training competitive deep video models is an order of magnitude slower t...
Given an outfit, what small changes would most improve its fashionabilit...
To understand the world, we humans constantly need to relate the present...
An ever increasing amount of our digital communication, media consumptio...
Training robust deep video representations has proven to be much more
ch...
Deep embeddings answer one simple question: How similar are two images?
...
Nonparametric models are versatile, albeit computationally expensive, to...
Understanding a user's motivations provides valuable information beyond ...