Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models

06/07/2023
by   Nikhil Kandpal, et al.
0

Currently, most machine learning models are trained by centralized teams and are rarely updated. In contrast, open-source software development involves the iterative development of a shared artifact through distributed collaboration using a version control system. In the interest of enabling collaborative and continual improvement of machine learning models, we introduce Git-Theta, a version control system for machine learning models. Git-Theta is an extension to Git, the most widely used version control software, that allows fine-grained tracking of changes to model parameters alongside code and other artifacts. Unlike existing version control systems that treat a model checkpoint as a blob of data, Git-Theta leverages the structure of checkpoints to support communication-efficient updates, automatic model merges, and meaningful reporting about the difference between two versions of a model. In addition, Git-Theta includes a plug-in system that enables users to easily add support for new functionality. In this paper, we introduce Git-Theta's design and features and include an example use-case of Git-Theta where a pre-trained model is continually adapted and modified. We publicly release Git-Theta in hopes of kickstarting a new era of collaborative model development.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/11/2016

Git4Voc: Git-based Versioning for Collaborative Vocabulary Development

Collaborative vocabulary development in the context of data integration ...
research
10/13/2017

Knowledge is at the Edge! How to Search in Distributed Machine Learning Models

With the advent of the Internet of Things and Industry 4.0 an enormous a...
research
08/06/2020

nPrint: A Standard Data Representation for Network Traffic Analysis

Conventional detection and classification ("fingerprinting") problems in...
research
04/13/2022

Aspirations and Practice of Model Documentation: Moving the Needle with Nudging and Traceability

Machine learning models have been widely developed, released, and adopte...
research
01/12/2022

The openCARP CDE – Concept for and implementation of a sustainable collaborative development environment for research software

This work describes the setup of an advanced technical infrastructure fo...
research
03/15/2023

PTMTorrent: A Dataset for Mining Open-source Pre-trained Model Packages

Due to the cost of developing and training deep learning models from scr...
research
10/08/2018

NSML: Meet the MLaaS platform with a real-world case study

The boom of deep learning induced many industries and academies to intro...

Please sign up or login with your details

Forgot password? Click here to reset