
-
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation
A majority of approaches solve the problem of video frame interpolation ...
read it
-
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
The visual and audio modalities are highly correlated yet they contain d...
read it
-
FASTER Recurrent Networks for Video Classification
Video classification methods often divide the video into short clips, do...
read it
-
UniDual: A Unified Model for Image and Video Understanding
Although a video is effectively a sequence of images, visual perception ...
read it
-
Video Modeling with Correlation Networks
Motion is a salient cue to recognize actions in video. Modern action rec...
read it
-
Learning Temporal Pose Estimation from Sparsely-Labeled Videos
Modern approaches for multi-person pose estimation in video require larg...
read it
-
What Makes Training Multi-Modal Networks Hard?
Consider end-to-end training of a multi-modal vs. a single-modal network...
read it
-
Large-scale weakly-supervised pre-training for video action recognition
Current fully-supervised video datasets consist of only a few hundred th...
read it
-
SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition
While many action recognition datasets consist of collections of brief, ...
read it
-
Video Classification with Channel-Separated Convolutional Networks
Group convolution has been shown to offer great computational savings in...
read it
-
DistInit: Learning Video Representations without a Single Labeled Video
Video recognition models have progressed significantly over the past few...
read it
-
Learning Discriminative Motion Features Through Detection
Despite huge success in the image domain, modern detection models such a...
read it
-
Co-Training of Audio and Video Representations from Self-Supervised Temporal Synchronization
There is a natural correlation between the visual and auditive elements ...
read it
-
Detect-and-Track: Efficient Pose Estimation in Videos
This paper addresses the problem of estimating and tracking human body k...
read it
-
A Closer Look at Spatiotemporal Convolutions for Action Recognition
In this paper we discuss several forms of spatiotemporal convolutions fo...
read it
-
Transformation-Based Models of Video Sequences
In this work we propose a simple unsupervised approach for next frame pr...
read it
-
VideoMCC: a New Benchmark for Video Comprehension
While there is overall agreement that future technology for organizing, ...
read it
-
Learning Spatiotemporal Features with 3D Convolutional Networks
We propose a simple, yet effective approach for spatiotemporal feature l...
read it
-
EXMOVES: Classifier-based Features for Scalable Action Recognition
This paper introduces EXMOVES, learned exemplar-based features for effic...
read it