DeepLens: Interactive Out-of-distribution Data Detection in NLP Models

03/02/2023
by   Da Song, et al.
0

Machine Learning (ML) has been widely used in Natural Language Processing (NLP) applications. A fundamental assumption in ML is that training data and real-world data should follow a similar distribution. However, a deployed ML model may suffer from out-of-distribution (OOD) issues due to distribution shifts in the real-world data. Though many algorithms have been proposed to detect OOD data from text corpora, there is still a lack of interactive tool support for ML developers. In this work, we propose DeepLens, an interactive system that helps users detect and explore OOD issues in massive text corpora. Users can efficiently explore different OOD types in DeepLens with the help of a text clustering method. Users can also dig into a specific text by inspecting salient words highlighted through neuron activation analysis. In a within-subjects user study with 24 participants, participants using DeepLens were able to find nearly twice more types of OOD issues accurately with 22 more confidence compared with a variant of DeepLens that has no interaction or visualization support.

READ FULL TEXT

page 1

page 4

page 5

page 6

page 7

page 8

page 10

page 12

research
07/06/2017

An Interactive Tool for Natural Language Processing on Clinical Text

Natural Language Processing (NLP) systems often make use of machine lear...
research
03/02/2023

DeepSeer: Interactive RNN Explanation and Debugging via State Abstraction

Recurrent Neural Networks (RNNs) have been widely used in Natural Langua...
research
05/05/2022

Interactive Model Cards: A Human-Centered Approach to Model Documentation

Deep learning models for natural language processing (NLP) are increasin...
research
10/26/2021

Reliable and Trustworthy Machine Learning for Health Using Dataset Shift Detection

Unpredictable ML model behavior on unseen data, especially in the health...
research
02/27/2023

GAM Coach: Towards Interactive and User-centered Algorithmic Recourse

Machine learning (ML) recourse techniques are increasingly used in high-...
research
06/21/2018

Par4Sim -- Adaptive Paraphrasing for Text Simplification

Learning from a real-world data stream and continuously updating the mod...
research
10/21/2019

Two Case Studies of Experience Prototyping Machine Learning Systems in the Wild

Throughout the course of my Ph.D., I have been designing the user experi...

Please sign up or login with your details

Forgot password? Click here to reset