A Visual Technique to Analyze Flow of Information in a Machine Learning System

08/02/2019
by   Abon Chaudhuri, et al.
5

Machine learning (ML) algorithms and machine learning based software systems implicitly or explicitly involve complex flow of information between various entities such as training data, feature space, validation set and results. Understanding the statistical distribution of such information and how they flow from one entity to another influence the operation and correctness of such systems, especially in large-scale applications that perform classification or prediction in real time. In this paper, we propose a visual approach to understand and analyze flow of information during model training and serving phases. We build the visualizations using a technique called Sankey Diagram - conventionally used to understand data flow among sets - to address various use cases of in a machine learning system. We demonstrate how the proposed technique, tweaked and twisted to suit a classification problem, can play a critical role in better understanding of the training data, the features, and the classifier performance. We also discuss how this technique enables diagnostic analysis of model predictions and comparative analysis of predictions from multiple classifiers. The proposed concept is illustrated with the example of categorization of millions of products in the e-commerce domain - a multi-class hierarchical classification problem.

READ FULL TEXT

page 2

page 3

page 5

page 7

research
06/05/2023

Information Flow Control in Machine Learning through Modular Model Architecture

In today's machine learning (ML) models, any part of the training data c...
research
07/18/2019

Comparing Multi-class, Binary and Hierarchical Machine Learning Classification schemes for variable stars

Upcoming synoptic surveys are set to generate an unprecedented amount of...
research
09/11/2019

Towards Safe Machine Learning for CPS: Infer Uncertainty from Training Data

Machine learning (ML) techniques are increasingly applied to decision-ma...
research
07/22/2020

InstanceFlow: Visualizing the Evolution of Classifier Confusion on the Instance Level

Classification is one of the most important supervised machine learning ...
research
03/04/2021

Calibrated Simplex Mapping Classification

We propose a novel supervised multi-class/single-label classifier that m...
research
09/28/2018

Reuse and Adaptation for Entity Resolution through Transfer Learning

Entity resolution (ER) is one of the fundamental problems in data integr...
research
06/08/2023

Sequential mediation of parasocial relationships for purchase intention: PLS-SEM and machine learning approach

Companies employ social media influencers SMIs due to the compelling evi...

Please sign up or login with your details

Forgot password? Click here to reset