Motion-Appearance Co-Memory Networks for Video Question Answering

03/29/2018
by   Jiyang Gao, et al.
0

Video Question Answering (QA) is an important task in understanding video temporal structure. We observe that there are three unique attributes of video QA compared with image QA: (1) it deals with long sequences of images containing richer information not only in quantity but also in variety; (2) motion and appearance information are usually correlated with each other and able to provide useful attention cues to the other; (3) different questions require different number of frames to infer the answer. Based these observations, we propose a motion-appearance comemory network for video QA. Our networks are built on concepts from Dynamic Memory Network (DMN) and introduces new mechanisms for video QA. Specifically, there are three salient aspects: (1) a co-memory attention mechanism that utilizes cues from both motion and appearance to generate attention; (2) a temporal conv-deconv network to generate multi-level contextual facts; (3) a dynamic fact ensemble method to construct temporal representation dynamically for different questions. We evaluate our method on TGIF-QA dataset, and the results outperform state-of-the-art significantly on all four tasks of TGIF-QA.

READ FULL TEXT

page 1

page 8

research
05/03/2017

The Forgettable-Watcher Model for Video Question Answering

A number of visual question answering approaches have been proposed rece...
research
04/18/2019

Progressive Attention Memory Network for Movie Story Question Answering

This paper proposes the progressive attention memory network (PAMN) for ...
research
06/02/2022

Structured Two-stream Attention Network for Video Question Answering

To date, visual question answering (VQA) (i.e., image QA and video QA) i...
research
06/19/2021

Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering

Video Question Answering is a task which requires an AI agent to answer ...
research
03/11/2017

Ask Me Even More: Dynamic Memory Tensor Networks (Extended Model)

We examine Memory Networks for the task of question answering (QA), unde...
research
02/01/2018

Adaptive Memory Networks

We present Adaptive Memory Networks (AMN) that processes input-question ...
research
05/10/2021

Poolingformer: Long Document Modeling with Pooling Attention

In this paper, we introduce a two-level attention schema, Poolingformer,...

Please sign up or login with your details

Forgot password? Click here to reset