Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs

10/15/2020
by   Ana Marasović, et al.
10

Natural language rationales could provide intuitive, higher-level explanations that are easily understandable by humans, complementing the more broadly studied lower-level explanations based on gradients or attention weights. We present the first study focused on generating natural language rationales across several complex visual reasoning tasks: visual commonsense reasoning, visual-textual entailment, and visual question answering. The key challenge of accurate rationalization is comprehensive image understanding at all levels: not just their explicit content at the pixel level, but their contextual contents at the semantic and pragmatic levels. We present Rationale^VT Transformer, an integrated model that learns to generate free-text rationales by combining pretrained language models with object recognition, grounded visual semantic frames, and visual commonsense graphs. Our experiments show that the base pretrained language model benefits from visual adaptation and that free-text rationalization is a promising research direction to complement model interpretability for complex visual-textual reasoning tasks.

READ FULL TEXT

page 2

page 3

page 4

page 7

page 19

page 20

research
12/16/2021

Commonsense Knowledge-Augmented Pretrained Language Models for Causal Reasoning Classification

Commonsense knowledge can be leveraged for identifying causal relations ...
research
01/27/2021

Knowledge-driven Natural Language Understanding of English Text and its Applications

Understanding the meaning of a text is a fundamental challenge of natura...
research
11/27/2018

From Recognition to Cognition: Visual Commonsense Reasoning

Visual understanding goes well beyond object recognition. With one glanc...
research
06/01/2021

PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World

We propose PIGLeT: a model that learns physical commonsense knowledge th...
research
06/25/2021

Rationale-Inspired Natural Language Explanations with Commonsense

Explainable machine learning models primarily justify predicted labels u...
research
07/23/2022

Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations

Visual Entailment with natural language explanations aims to infer the r...
research
04/07/2020

e-SNLI-VE-2.0: Corrected Visual-Textual Entailment with Natural Language Explanations

The recently proposed SNLI-VE corpus for recognising visual-textual enta...

Please sign up or login with your details

Forgot password? Click here to reset