
-
RareAct: A video dataset of unusual interactions
This paper introduces a manually annotated video dataset of unusual acti...
read it
-
Self-Supervised MultiModal Versatile Networks
Videos are a rich source of multi-modal supervision. In this work, we le...
read it
-
Learning to Segment Actions from Observation and Narration
We apply a generative segmental model of task structure, guided by narra...
read it
-
Visual Grounding in Video for Unsupervised Word Translation
There are thousands of actively spoken languages on Earth, but a single ...
read it
-
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
Annotating videos is cumbersome, expensive and not scalable. Yet, many s...
read it
-
Controllable Attention for Structured Layered Video Decomposition
The objective of this paper is to be able to separate a video into its n...
read it
-
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Learning text-video embeddings usually requires a dataset of video clips...
read it
-
Are Labels Required for Improving Adversarial Robustness?
Recent work has uncovered the interesting (and somewhat surprising) find...
read it
-
Cross-task weakly supervised learning from instructional videos
In this paper we investigate learning visual models for the steps of ord...
read it
-
The Visual Centrifuge: Model-Free Layered Video Representations
True video understanding requires making sense of non-lambertian scenes ...
read it
-
Learning to Localize and Align Fine-Grained Actions to Sparse Instructions
Automatic generation of textual video descriptions that are time-aligned...
read it
-
A flexible model for training action localization with varying levels of supervision
Spatio-temporal action detection in videos is typically addressed in a f...
read it
-
Learning from Video and Text via Large-Scale Discriminative Clustering
Discriminative clustering has been successfully applied to a number of w...
read it
-
SEARNN: Training RNNs with Global-Local Losses
We propose SEARNN, a novel training algorithm for recurrent neural netwo...
read it
-
Joint Discovery of Object States and Manipulation Actions
Many human activities involve object manipulations aiming to modify the ...
read it
-
Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs
In this paper, we propose several improvements on the block-coordinate F...
read it
-
Unsupervised Learning from Narrated Instruction Videos
We address the problem of automatically learning the main steps to compl...
read it