Unsupervised Semantic Action Discovery from Video Collections

05/11/2016
by   Ozan Sener, et al.
0

Human communication takes many forms, including speech, text and instructional videos. It typically has an underlying structure, with a starting point, ending, and certain objective steps between them. In this paper, we consider instructional videos where there are tens of millions of them on the Internet. We propose a method for parsing a video into such semantic steps in an unsupervised way. Our method is capable of providing a semantic "storyline" of the video composed of its objective steps. We accomplish this using both visual and language cues in a joint generative model. Our method can also provide a textual description for each of the identified semantic steps and video segments. We evaluate our method on a large number of complex YouTube videos and show that our method discovers semantically correct instructions for a variety of tasks.

READ FULL TEXT

page 2

page 5

page 9

page 13

page 14

page 15

page 16

research
06/28/2015

Unsupervised Semantic Parsing of Video Collections

Human communication typically has an underlying structure. This is refle...
research
06/30/2015

Unsupervised Learning from Narrated Instruction Videos

We address the problem of automatically learning the main steps to compl...
research
10/16/2022

Motion-Based Weak Supervision for Video Parsing with Application to Colonoscopy

We propose a two-stage unsupervised approach for parsing videos into pha...
research
09/20/2023

Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding

While most modern video understanding models operate on short-range clip...
research
03/07/2017

Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos

We propose an unsupervised method for reference resolution in instructio...
research
03/05/2015

What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision

We present a novel method for aligning a sequence of instructions to a v...
research
10/29/2022

Unsupervised Audio-Visual Lecture Segmentation

Over the last decade, online lecture videos have become increasingly pop...

Please sign up or login with your details

Forgot password? Click here to reset