Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation

02/15/2023
by   Joshua Vendrow, et al.
0

Distribution shifts are a major source of failure of deployed machine learning models. However, evaluating a model's reliability under distribution shifts can be challenging, especially since it may be difficult to acquire counterfactual examples that exhibit a specified shift. In this work, we introduce dataset interfaces: a framework which allows users to scalably synthesize such counterfactual examples from a given dataset. Specifically, we represent each class from the input dataset as a custom token within the text space of a text-to-image diffusion model. By incorporating these tokens into natural language prompts, we can then generate instantiations of objects in that dataset under desired distribution shifts. We demonstrate how applying our framework to the ImageNet dataset enables us to study model behavior across a diverse array of shifts, including variations in background, lighting, and attributes of the objects themselves. Code available at https://github.com/MadryLab/dataset-interfaces.

READ FULL TEXT

page 3

page 5

page 8

page 15

page 17

page 18

page 19

page 20

research
08/11/2020

BREEDS: Benchmarks for Subpopulation Shift

We develop a methodology for assessing the robustness of models to subpo...
research
06/30/2022

GSCLIP : A Framework for Explaining Distribution Shifts in Natural Language

Helping end users comprehend the abstract distribution shifts can greatl...
research
04/12/2021

Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation

Robustness and counterfactual bias are usually evaluated on a test datas...
research
08/02/2020

SemEval-2020 Task 5: Counterfactual Recognition

We present a counterfactual recognition (CR) task, the shared Task 5 of ...
research
09/25/2021

A Principled Approach to Failure Analysis and Model Repairment: Demonstration in Medical Imaging

Machine learning models commonly exhibit unexpected failures post-deploy...
research
07/27/2023

Understanding Silent Failures in Medical Image Classification

To ensure the reliable use of classification systems in medical applicat...
research
10/20/2022

Counterfactual Recipe Generation: Exploring Compositional Generalization in a Realistic Scenario

People can acquire knowledge in an unsupervised manner by reading, and c...

Please sign up or login with your details

Forgot password? Click here to reset