Discovering and Validating AI Errors With Crowdsourced Failure Reports

09/23/2021
by   Ángel Alexander Cabrera, et al.
0

AI systems can fail to learn important behaviors, leading to real-world issues like safety concerns and biases. Discovering these systematic failures often requires significant developer attention, from hypothesizing potential edge cases to collecting evidence and validating patterns. To scale and streamline this process, we introduce crowdsourced failure reports, end-user descriptions of how or why a model failed, and show how developers can use them to detect AI errors. We also design and implement Deblinder, a visual analytics system for synthesizing failure reports that developers can use to discover and validate systematic failures. In semi-structured interviews and think-aloud studies with 10 AI practitioners, we explore the affordances of the Deblinder system and the applicability of failure reports in real-world settings. Lastly, we show how collecting additional data from the groups identified by developers can improve model performance.

READ FULL TEXT

page 2

page 10

page 16

research
02/09/2023

Zeno: An Interactive Framework for Behavioral Evaluation of Machine Learning

Machine learning models with high accuracy on test data can still produc...
research
02/22/2023

fAIlureNotes: Supporting Designers in Understanding the Limits of AI Models for Computer Vision Tasks

To design with AI models, user experience (UX) designers must assess the...
research
07/18/2020

AI Failures: A Review of Underlying Issues

Instances of Artificial Intelligence (AI) systems failing to deliver con...
research
11/17/2020

Preventing Repeated Real World AI Failures by Cataloging Incidents: The AI Incident Database

Mature industrial sectors (e.g., aviation) collect their real world fail...
research
05/07/2018

Cutting Away the Confusion From Crowdtesting

Crowdtesting is effective especially when it comes to the feedback on GU...
research
06/03/2019

Empirical Analysis of Factors and their Effect on Test Flakiness - Practitioners' Perceptions

Developers always wish to ensure that their latest changes to the code b...
research
11/18/2022

Indexing AI Risks with Incidents, Issues, and Variants

Two years after publicly launching the AI Incident Database (AIID) as a ...

Please sign up or login with your details

Forgot password? Click here to reset