Hierarchical Conditional Relation Networks for Video Question Answering

02/25/2020
by   Thao Minh Le, et al.
6

Video question answering (VideoQA) is challenging as it requires modeling capacity to distill dynamic visual artifacts and distant relations and to associate them with linguistic concepts. We introduce a general-purpose reusable neural unit called Conditional Relation Network (CRN) that serves as a building block to construct more sophisticated structures for representation and reasoning over video. CRN takes as input an array of tensorial objects and a conditioning feature, and computes an array of encoded output objects. Model building becomes a simple exercise of replication, rearrangement and stacking of these reusable units for diverse modalities and contextual information. This design thus supports high-order relational and multi-step reasoning. The resulting architecture for VideoQA is a CRN hierarchy whose branches represent sub-videos or clips, all sharing the same question as the contextual condition. Our evaluations on well-known datasets achieved new SoTA results, demonstrating the impact of building a general-purpose reasoning unit on complex domains such as VideoQA.

READ FULL TEXT

page 2

page 4

research
10/18/2020

Hierarchical Conditional Relation Networks for Multimodal Video Question Answering

Video QA challenges modelers in multiple fronts. Modeling video necessit...
research
07/10/2019

Learning to Reason with Relational Video Representation for Question Answering

How does machine learn to reason about the content of a video in answeri...
research
04/30/2020

Dynamic Language Binding in Relational Visual Reasoning

We present Language-binding Object Graph Network, the first neural reaso...
research
06/25/2021

Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering

Video Question Answering (Video QA) is a powerful testbed to develop new...
research
12/12/2021

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering

Video question answering requires the models to understand and reason ab...
research
09/22/2017

FiLM: Visual Reasoning with a General Conditioning Layer

We introduce a general-purpose conditioning method for neural networks c...
research
10/31/2022

Lila: A Unified Benchmark for Mathematical Reasoning

Mathematical reasoning skills are essential for general-purpose intellig...

Please sign up or login with your details

Forgot password? Click here to reset