Aspirations and Practice of Model Documentation: Moving the Needle with Nudging and Traceability

04/13/2022
by   Avinash Bhat, et al.
18

Machine learning models have been widely developed, released, and adopted in numerous applications. Meanwhile, the documentation practice for machine learning models often falls short of established practices for traditional software components, which impedes model accountability, inadvertently abets inappropriate or misuse of models, and may trigger negative social impact. Recently, model cards, a template for documenting machine learning models, have attracted notable attention, but their impact on the practice of model documentation is unclear. In this work, we examine publicly available model cards and other similar documentation. Our analysis reveals a substantial gap between the suggestions made in the original model card work and the content in actual documentation. Motivated by this observation and literature on fields such as software documentation, interaction design, and traceability, we further propose a set of design guidelines that aim to support the documentation practice for machine learning models including (1) the collocation of documentation environment with the coding environment, (2) nudging the consideration of model card sections during model development, and (3) documentation derived from and traced to the source. We designed a prototype tool named DocML following those guidelines to support model development in computational notebooks. A lab study reveals the benefit of our tool to shift the behavior of data scientists towards documentation quality and accountability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2021

A Data Quality-Driven View of MLOps

Developing machine learning models can be seen as a process similar to t...
research
03/01/2021

Practices for Engineering Trustworthy Machine Learning Applications

Following the recent surge in adoption of machine learning (ML), the neg...
research
06/14/2023

How to estimate carbon footprint when training deep learning models? A guide and review

Machine learning and deep learning models have become essential in the r...
research
04/29/2022

Data+Shift: Supporting visual investigation of data distribution shifts by data scientists

Machine learning on data streams is increasingly more present in multipl...
research
06/07/2023

Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models

Currently, most machine learning models are trained by centralized teams...
research
08/29/2023

A General Recipe for Automated Machine Learning in Practice

Automated Machine Learning (AutoML) is an area of research that focuses ...
research
09/09/2023

A Full-fledged Commit Message Quality Checker Based on Machine Learning

Commit messages (CMs) are an essential part of version control. By provi...

Please sign up or login with your details

Forgot password? Click here to reset