VLSlice: Interactive Vision-and-Language Slice Discovery

09/13/2023
by   Eric Slyman, et al.
0

Recent work in vision-and-language demonstrates that large-scale pretraining can learn generalizable models that are efficiently transferable to downstream tasks. While this may improve dataset-scale aggregate metrics, analyzing performance around hand-crafted subgroups targeting specific bias dimensions reveals systemic undesirable behaviors. However, this subgroup analysis is frequently stalled by annotation efforts, which require extensive time and resources to collect the necessary data. Prior art attempts to automatically discover subgroups to circumvent these constraints but typically leverages model behavior on existing task-specific annotations and rapidly degrades on more complex inputs beyond "tabular" data, none of which study vision-and-language models. This paper presents VLSlice, an interactive system enabling user-guided discovery of coherent representation-level subgroups with consistent visiolinguistic behavior, denoted as vision-and-language slices, from unlabeled image sets. We show that VLSlice enables users to quickly generate diverse high-coherency slices in a user study (n=22) and release the tool publicly.

READ FULL TEXT

page 2

page 5

page 8

page 12

page 14

page 15

page 16

page 17

research
10/04/2020

PTUM: Pre-training User Model from Unlabeled User Behaviors via Self-supervision

User modeling is critical for many personalized web services. Many exist...
research
06/13/2023

Where Does My Model Underperform? A Human Evaluation of Slice Discovery Algorithms

Machine learning (ML) models that achieve high average accuracy can stil...
research
08/24/2021

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

With recent progress in joint modeling of visual and textual representat...
research
03/24/2022

Domino: Discovering Systematic Errors with Cross-Modal Embeddings

Machine learning models that achieve high overall accuracy often make sy...
research
02/14/2023

ScatterShot: Interactive In-context Example Curation for Text Transformation

The in-context learning capabilities of LLMs like GPT-3 allow annotators...
research
10/11/2022

SEAL : Interactive Tool for Systematic Error Analysis and Labeling

With the advent of Transformers, large language models (LLMs) have satur...
research
08/13/2023

MDB: Interactively Querying Datasets and Models

As models are trained and deployed, developers need to be able to system...

Please sign up or login with your details

Forgot password? Click here to reset