Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data

01/17/2020
by   Kyoung-Woon On, et al.
0

Conventional sequential learning methods such as Recurrent Neural Networks (RNNs) focus on interactions between consecutive inputs, i.e. first-order Markovian dependency. However, most of sequential data, as seen with videos, have complex dependency structures that imply variable-length semantic flows and their compositions, and those are hard to be captured by conventional methods. Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for learning video data by discovering these complex structures of the video. The CB-GLNs represent video data as a graph, with nodes and edges corresponding to frames of the video and their dependencies respectively. The CB-GLNs find compositional dependencies of the data in multilevel graph forms via a parameterized kernel with graph-cut and a message passing framework. We evaluate the proposed method on the two different tasks for video understanding: Video theme classification (Youtube-8M dataset) and Video Question and Answering (TVQA dataset). The experimental results show that our model efficiently learns the semantic compositional structure of video data. Furthermore, our model achieves the highest performance in comparison to other baseline methods.

READ FULL TEXT

page 6

page 7

research
07/03/2019

Compositional Structure Learning for Sequential Video Data

Conventional sequential learning methods such as Recurrent Neural Networ...
research
01/20/2019

Visualizing Semantic Structures of Sequential Data by Learning Temporal Dependencies

While conventional methods for sequential learning focus on interaction ...
research
10/19/2022

Dense but Efficient VideoQA for Intricate Compositional Reasoning

It is well known that most of the conventional video question answering ...
research
04/29/2021

Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering

This paper presents a novel method, termed Bridge to Answer, to infer co...
research
01/19/2020

Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks

This work proposes a novel attentive graph neural network (AGNN) for zer...
research
03/06/2023

Confidence-based Event-centric Online Video Question Answering on a Newly Constructed ATBS Dataset

Deep neural networks facilitate video question answering (VideoQA), but ...
research
07/26/2021

Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference

Video-and-Language Inference is a recently proposed task for joint video...

Please sign up or login with your details

Forgot password? Click here to reset