Debugging Machine Learning Tasks

03/23/2016
by   Aleksandar Chakarov, et al.
0

Unlike traditional programs (such as operating systems or word processors) which have large amounts of code, machine learning tasks use programs with relatively small amounts of code (written in machine learning libraries), but voluminous amounts of data. Just like developers of traditional programs debug errors in their code, developers of machine learning tasks debug and fix errors in their data. However, algorithms and tools for debugging and fixing errors in data are less common, when compared to their counterparts for detecting and fixing errors in code. In this paper, we consider classification tasks where errors in training data lead to misclassifications in test points, and propose an automated method to find the root causes of such misclassifications. Our root cause analysis is based on Pearl's theory of causation, and uses Pearl's PS (Probability of Sufficiency) as a scoring metric. Our implementation, Psi, encodes the computation of PS as a probabilistic program, and uses recent work on probabilistic programs and transformations on probabilistic programs (along with gray-box models of machine learning algorithms) to efficiently compute PS. Psi is able to identify root causes of data errors in interesting data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2017

Finding Root Causes of Floating Point Error with Herbgrind

Floating point arithmetic plays a central role in science, engineering, ...
research
01/19/2019

Kantorovich Continuity of Probabilistic Programs

The Kantorovich metric is a canonical lifting of a distance from sets to...
research
03/07/2022

Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descriptions

The execution behavior of a program often depends on external resources,...
research
05/10/2018

Ariadne: Analysis for Machine Learning Program

Machine learning has transformed domains like vision and translation, an...
research
02/11/2020

Debugging Machine Learning Pipelines

Machine learning tasks entail the use of complex computational pipelines...
research
04/12/2020

BugDoc: Algorithms to Debug Computational Processes

Data analysis for scientific experiments and enterprises, large-scale si...
research
03/25/2022

C to Checked C by 3C

Owing to the continued use of C (and C++), spatial safety violations (e....

Please sign up or login with your details

Forgot password? Click here to reset