Compositional Structure Learning for Sequential Video Data

07/03/2019
by   Kyoung-Woon On, et al.
0

Conventional sequential learning methods such as Recurrent Neural Networks (RNNs) focus on interactions between consecutive inputs, i.e. first-order Markovian dependency. However, most of sequential data, as seen with videos, have complex temporal dependencies that imply variable-length semantic flows and their compositions, and those are hard to be captured by conventional methods. Here, we propose Temporal Dependency Networks (TDNs) for learning video data by discovering these complex structures of the videos. The TDNs represent video as a graph whose nodes and edges correspond to frames of the video and their dependencies respectively. Via a parameterized kernel with graph-cut and graph convolutions, the TDNs find compositional temporal dependencies of the data in multilevel graph forms. We evaluate the proposed method on the large-scale video dataset Youtube-8M. The experimental results show that our model efficiently learns the complex semantic structure of video data.

READ FULL TEXT
research
01/17/2020

Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data

Conventional sequential learning methods such as Recurrent Neural Networ...
research
01/20/2019

Visualizing Semantic Structures of Sequential Data by Learning Temporal Dependencies

While conventional methods for sequential learning focus on interaction ...
research
04/28/2019

Hierarchical Recurrent Neural Network for Video Summarization

Exploiting the temporal dependency among video frames or subshots is ver...
research
10/19/2022

Dense but Efficient VideoQA for Intricate Compositional Reasoning

It is well known that most of the conventional video question answering ...
research
09/05/2012

Video Data Visualization System: Semantic Classification And Personalization

We present in this paper an intelligent video data visualization tool, b...
research
11/17/2014

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

Models based on deep convolutional networks have dominated recent image ...
research
07/02/2019

Learnable Gated Temporal Shift Module for Deep Video Inpainting

How to efficiently utilize temporal information to recover videos in a c...

Please sign up or login with your details

Forgot password? Click here to reset