Object-based attention for spatio-temporal reasoning: Outperforming neuro-symbolic models with flexible distributed architectures

12/15/2020
by   David Ding, et al.
0

Neural networks have achieved success in a wide array of perceptual tasks, but it is often stated that they are incapable of solving tasks that require higher-level reasoning. Two new task domains, CLEVRER and CATER, have recently been developed to focus on reasoning, as opposed to perception, in the context of spatio-temporal interactions between objects. Initial experiments on these domains found that neuro-symbolic approaches, which couple a logic engine and language parser with a neural perceptual front-end, substantially outperform fully-learned distributed networks, a finding that was taken to support the above thesis. Here, we show on the contrary that a fully-learned neural network with the right inductive biases can perform substantially better than all previous neural-symbolic models on both of these tasks, particularly on questions that most emphasize reasoning over perception. Our model makes critical use of both self-attention and learned "soft" object-centric representations, as well as BERT-style semi-supervised predictive losses. These flexible biases allow our model to surpass the previous neuro-symbolic state-of-the-art using less than 60 these results refute the neuro-symbolic thesis laid out by previous work involving these datasets, and they provide evidence that neural networks can indeed learn to reason effectively about the causal, dynamic structure of physical events.

READ FULL TEXT

page 14

page 15

page 16

page 17

page 21

page 22

research
01/17/2021

HySTER: A Hybrid Spatio-Temporal Event Reasoner

The task of Video Question Answering (VideoQA) consists in answering nat...
research
12/25/2020

Logic Tensor Networks

Artificial Intelligence agents are required to learn from their surround...
research
10/03/2019

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

The ability to reason about temporal and causal events from videos lies ...
research
06/17/2021

On the Capabilities of Pointer Networks for Deep Deductive Reasoning

The importance of building neural networks that can learn to reason has ...
research
06/25/2021

Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering

Video Question Answering (Video QA) is a powerful testbed to develop new...
research
05/17/2019

Neural Message Passing on Hybrid Spatio-Temporal Visual and Symbolic Graphs for Video Understanding

Many problems in video understanding require labeling multiple activitie...
research
10/17/2022

A Solver-Free Framework for Scalable Learning in Neural ILP Architectures

There is a recent focus on designing architectures that have an Integer ...

Please sign up or login with your details

Forgot password? Click here to reset