LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images

05/30/2023
by   Viraj Prabhu, et al.
0

We propose an automated algorithm to stress-test a trained visual model by generating language-guided counterfactual test images (LANCE). Our method leverages recent progress in large language modeling and text-based image editing to augment an IID test set with a suite of diverse, realistic, and challenging test images without altering model weights. We benchmark the performance of a diverse set of pretrained models on our generated data and observe significant and consistent performance drops. We further analyze model sensitivity across different types of edits, and demonstrate its applicability at surfacing previously unknown class-level model biases in ImageNet.

READ FULL TEXT

page 1

page 3

page 8

page 9

page 14

page 16

page 18

research
04/21/2023

RoCOCO: Robust Benchmark MS-COCO to Stress-test Robustness of Image-Text Matching Models

Recently, large-scale vision-language pre-training models and visual sem...
research
09/21/2020

SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning

Iterative Language-Based Image Editing (IL-BIE) tasks follow iterative i...
research
06/21/2022

Plug and Play Counterfactual Text Generation for Model Robustness

Generating counterfactual test-cases is an important backbone for testin...
research
12/22/2021

CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models

We introduce the CRASS (counterfactual reasoning assessment) data set an...
research
06/27/2023

What Makes ImageNet Look Unlike LAION

ImageNet was famously created from Flickr image search results. What if ...
research
11/01/2021

Using Synthetic Images To Uncover Population Biases In Facial Landmarks Detection

In order to analyze a trained model performance and identify its weak sp...
research
11/01/2020

Comprehensible Counterfactual Interpretation on Kolmogorov-Smirnov Test

The Kolmogorov-Smirnov (KS) test is popularly used in many applications,...

Please sign up or login with your details

Forgot password? Click here to reset