MultiViz: An Analysis Benchmark for Visualizing and Understanding Multimodal Models

06/30/2022
by   Paul Pu Liang, et al.
2

The promise of multimodal models for real-world applications has inspired research in visualizing and understanding their internal mechanics with the end goal of empowering stakeholders to visualize model behavior, perform model debugging, and promote trust in machine learning models. However, modern multimodal models are typically black-box neural networks, which makes it challenging to understand their internal mechanics. How can we visualize the internal modeling of multimodal interactions in these models? Our paper aims to fill this gap by proposing MultiViz, a method for analyzing the behavior of multimodal models by scaffolding the problem of interpretability into 4 stages: (1) unimodal importance: how each modality contributes towards downstream modeling and prediction, (2) cross-modal interactions: how different modalities relate with each other, (3) multimodal representations: how unimodal and cross-modal interactions are represented in decision-level features, and (4) multimodal prediction: how decision-level features are composed to make a prediction. MultiViz is designed to operate on diverse modalities, models, tasks, and research areas. Through experiments on 8 trained models across 6 real-world tasks, we show that the complementary stages in MultiViz together enable users to (1) simulate model predictions, (2) assign interpretable concepts to features, (3) perform error analysis on model misclassifications, and (4) use insights from error analysis to debug models. MultiViz is publicly available, will be regularly updated with new interpretation tools and metrics, and welcomes inputs from the community.

READ FULL TEXT

page 3

page 7

page 8

page 25

page 26

page 28

page 29

page 30

research
08/12/2018

Multimodal Language Analysis with Recurrent Multistage Fusion

Computational modeling of human multimodal language is an emerging resea...
research
03/03/2022

DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations

The ability for a human to understand an Artificial Intelligence (AI) mo...
research
10/13/2020

Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!

Modeling expressive cross-modal interactions seems crucial in multimodal...
research
06/28/2023

MultiZoo MultiBench: A Standardized Toolkit for Multimodal Deep Learning

Learning multimodal representations involves integrating information fro...
research
07/01/2023

SHARCS: Shared Concept Space for Explainable Multimodal Learning

Multimodal learning is an essential paradigm for addressing complex real...
research
07/11/2023

One-Versus-Others Attention: Scalable Multimodal Integration

Multimodal learning models have become increasingly important as they su...
research
05/21/2023

HIINT: Historical, Intra- and Inter- personal Dynamics Modeling with Cross-person Memory Transformer

Accurately modeling affect dynamics, which refers to the changes and flu...

Please sign up or login with your details

Forgot password? Click here to reset