Tribuo: Machine Learning with Provenance in Java

10/06/2021
by   Adam Pocock, et al.
0

Machine Learning models are deployed across a wide range of industries, performing a wide range of tasks. Tracking these models and ensuring they behave appropriately is becoming increasingly difficult as the number of deployed models increases. There are also new regulatory burdens for ML systems which affect human lives, requiring a link between a model and its training data in high-risk situations. Current ML monitoring systems often provide provenance and experiment tracking as a layer on top of an ML library, allowing room for imperfect tracking and skew between the tracked object and the metadata. In this paper we introduce Tribuo, a Java ML library that integrates model training, inference, strong type-safety, runtime checking, and automatic provenance recording into a single framework. All Tribuo's models and evaluations record the full processing pipeline for input data, along with the training algorithms, hyperparameters and data transformation steps automatically. The provenance lives inside the model object and can be persisted separately using common markup formats. Tribuo implements many popular ML algorithms for classification, regression, clustering, multi-label classification and anomaly detection, along with interfaces to XGBoost, TensorFlow and ONNX Runtime. Tribuo's source code is available at https://github.com/oracle/tribuo under an Apache 2.0 license with documentation and tutorials available at https://tribuo.org.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2022

MAPIE: an open-source library for distribution-free uncertainty quantification

Estimating uncertainties associated with the predictions of Machine Lear...
research
06/04/2022

CVNets: High Performance Library for Computer Vision

We introduce CVNets, a high-performance open-source library for training...
research
04/06/2020

giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration

We introduce giotto-tda, a Python library that integrates high-performan...
research
09/12/2023

MatSciML: A Broad, Multi-Task Benchmark for Solid-State Materials Modeling

We propose MatSci ML, a novel benchmark for modeling MATerials SCIence u...
research
09/24/2020

Secure Data Sharing With Flow Model

In the classical multi-party computation setting, multiple parties joint...
research
05/27/2020

SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical Difference Measure

Ensuring safety and explainability of machine learning (ML) is a topic o...
research
03/03/2020

Model Assertions for Monitoring and Improving ML Model

ML models are increasingly deployed in settings with real world interact...

Please sign up or login with your details

Forgot password? Click here to reset