REX: Reasoning-aware and Grounded Explanation

03/11/2022
by   Shi Chen, et al.
0

Effectiveness and interpretability are two essential properties for trustworthy AI systems. Most recent studies in visual reasoning are dedicated to improving the accuracy of predicted answers, and less attention is paid to explaining the rationales behind the decisions. As a result, they commonly take advantage of spurious biases instead of actually reasoning on the visual-textual data, and have yet developed the capability to explain their decision making by considering key information from both modalities. This paper aims to close the gap from three distinct perspectives: first, we define a new type of multi-modal explanations that explain the decisions by progressively traversing the reasoning process and grounding keywords in the images. We develop a functional program to sequentially execute different reasoning steps and construct a new dataset with 1,040,830 multi-modal explanations. Second, we identify the critical need to tightly couple important components across the visual and textual modalities for explaining the decisions, and propose a novel explanation generation method that explicitly models the pairwise correspondence between words and regions of interest. It improves the visual grounding capability by a considerable margin, resulting in enhanced interpretability and reasoning performance. Finally, with our new data and method, we perform extensive analyses to study the effectiveness of our explanation under different settings, including multi-task learning and transfer learning. Our code and data are available at https://github.com/szzexpoi/rex.

READ FULL TEXT

page 1

page 3

page 7

page 13

research
09/29/2020

Where is the Model Looking At?–Concentrate and Explain the Network Attention

Image classification models have achieved satisfactory performance on ma...
research
03/20/2018

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Most existing works in visual question answering (VQA) are dedicated to ...
research
04/05/2023

Detecting and Grounding Multi-Modal Media Manipulation

Misinformation has become a pressing issue. Fake media, in both visual a...
research
07/28/2020

AiR: Attention with Reasoning Capability

While attention has been an increasingly popular component in deep neura...
research
06/30/2022

Personalized Showcases: Generating Multi-Modal Explanations for Recommendations

Existing explanation models generate only text for recommendations but s...
research
03/17/2023

Hospital Length of Stay Prediction Based on Multi-modal Data towards Trustworthy Human-AI Collaboration in Radiomics

To what extent can the patient's length of stay in a hospital be predict...
research
04/20/2022

Attention in Reasoning: Dataset, Analysis, and Modeling

While attention has been an increasingly popular component in deep neura...

Please sign up or login with your details

Forgot password? Click here to reset