GSCLIP : A Framework for Explaining Distribution Shifts in Natural Language

06/30/2022
by   Zhiying Zhu, et al.
0

Helping end users comprehend the abstract distribution shifts can greatly facilitate AI deployment. Motivated by this, we propose a novel task, dataset explanation. Given two image data sets, dataset explanation aims to automatically point out their dataset-level distribution shifts with natural language. Current techniques for monitoring distribution shifts provide inadequate information to understand datasets with the goal of improving data quality. Therefore, we introduce GSCLIP, a training-free framework to solve the dataset explanation task. In GSCLIP, we propose the selector as the first quantitative evaluation method to identify explanations that are proper to summarize dataset shifts. Furthermore, we leverage this selector to demonstrate the superiority of a generator based on language model generation. Systematic evaluation on natural data shift verifies that GSCLIP, a combined system of a hybrid generator group and an efficient selector is not only easy-to-use but also powerful for dataset explanation at scale.

READ FULL TEXT
research
02/14/2022

MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts

Understanding the performance of machine learning models across diverse ...
research
10/22/2022

Explanation Shift: Detecting distribution shifts on tabular data via the explanation space

As input data distributions evolve, the predictive performance of machin...
research
02/15/2023

Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation

Distribution shifts are a major source of failure of deployed machine le...
research
01/28/2021

Explaining Natural Language Processing Classifiers with Occlusion and Language Modeling

Deep neural networks are powerful statistical learners. However, their p...
research
06/11/2023

On Minimizing the Impact of Dataset Shifts on Actionable Explanations

The Right to Explanation is an important regulatory principle that allow...
research
04/17/2023

K-means Clustering Based Feature Consistency Alignment for Label-free Model Evaluation

The label-free model evaluation aims to predict the model performance on...
research
10/19/2022

Towards Explaining Distribution Shifts

A distribution shift can have fundamental consequences such as signaling...

Please sign up or login with your details

Forgot password? Click here to reset