
-
TrackFormer: Multi-Object Tracking with Transformers
We present TrackFormer, an end-to-end multi-object tracking and segmenta...
read it
-
X3D: Expanding Architectures for Efficient Video Recognition
This paper presents X3D, a family of efficient video networks that progr...
read it
-
Feature Pyramid Grids
Feature pyramid networks have been widely adopted in the object detectio...
read it
-
Audiovisual SlowFast Networks for Video Recognition
We present Audiovisual SlowFast Networks, an architecture for integrated...
read it
-
EGO-TOPO: Environment Affordances from Egocentric Video
First-person video naturally brings the use of a physical environment to...
read it
-
A Multigrid Method for Efficiently Training Video Models
Training competitive deep video models is an order of magnitude slower t...
read it
-
Learning Temporal Pose Estimation from Sparsely-Labeled Videos
Modern approaches for multi-person pose estimation in video require larg...
read it
-
Grounded Human-Object Interaction Hotspots from Video (Extended Abstract)
Learning how to interact with objects is an important step towards embod...
read it
-
Modeling Human Motion with Quaternion-based Neural Networks
Previous work on predicting or generating 3D human pose sequences regres...
read it
-
Long-Term Feature Banks for Detailed Video Understanding
To understand the world, we humans constantly need to relate the present...
read it
-
Grounded Human-Object Interaction Hotspots from Video
Learning how to interact with objects is an important step towards embod...
read it
-
Learning Discriminative Motion Features Through Detection
Despite huge success in the image domain, modern detection models such a...
read it
-
SlowFast Networks for Video Recognition
We present SlowFast networks for video recognition. Our model involves (...
read it
-
3D human pose estimation in video with temporal convolutions and semi-supervised training
In this work, we demonstrate that 3D poses in video can be effectively e...
read it
-
Camera-based vehicle velocity estimation from monocular video
This paper documents the winning entry at the CVPR2017 vehicle velocity ...
read it
-
What have we learned from deep representations for action recognition?
As the success of deep models has led to their deployment in all areas o...
read it
-
Detect to Track and Track to Detect
Recent approaches for high accuracy detection and tracking of object cat...
read it
-
Spatiotemporal Residual Networks for Video Action Recognition
Two-stream Convolutional Networks (ConvNets) have shown strong performan...
read it
-
Convolutional Two-Stream Network Fusion for Video Action Recognition
Recent applications of Convolutional Neural Networks (ConvNets) for huma...
read it