Data+Shift: Supporting visual investigation of data distribution shifts by data scientists

04/29/2022
by   João Palmeiro, et al.
18

Machine learning on data streams is increasingly more present in multiple domains. However, there is often data distribution shift that can lead machine learning models to make incorrect decisions. While there are automatic methods to detect when drift is happening, human analysis, often by data scientists, is essential to diagnose the causes of the problem and adjust the system. We propose Data+Shift, a visual analytics tool to support data scientists in the task of investigating the underlying factors of shift in data features in the context of fraud detection. Design requirements were derived from interviews with data scientists. Data+Shift is integrated with JupyterLab and can be used alongside other data science tools. We validated our approach with a think-aloud experiment where a data scientist used the tool for a fraud detection use case.

READ FULL TEXT

page 1

page 3

research
07/03/2022

Identifying the Context Shift between Test Benchmarks and Production Data

Across a wide variety of domains, there exists a performance gap between...
research
11/01/2018

Bias Reduction via End-to-End Shift Learning: Application to Citizen Science

Citizen science projects are successful at gathering rich datasets for v...
research
01/07/2022

Similarities and Differences between Machine Learning and Traditional Advanced Statistical Modeling in Healthcare Analytics

Data scientists and statisticians are often at odds when determining the...
research
06/13/2018

Towards Semantically Enhanced Data Understanding

In the field of machine learning, data understanding is the practice of ...
research
04/18/2021

Failing Conceptually: Concept-Based Explanations of Dataset Shift

Despite their remarkable performance on a wide range of visual tasks, ma...
research
04/13/2022

Aspirations and Practice of Model Documentation: Moving the Needle with Nudging and Traceability

Machine learning models have been widely developed, released, and adopte...
research
12/05/2018

A distributed data warehouse system for astroparticle physics

A distributed data warehouse system is one of the actual issues in the f...

Please sign up or login with your details

Forgot password? Click here to reset