MLTEing Models: Negotiating, Evaluating, and Documenting Model and System Qualities

03/03/2023
by   Katherine R. Maffey, et al.
0

Many organizations seek to ensure that machine learning (ML) and artificial intelligence (AI) systems work as intended in production but currently do not have a cohesive methodology in place to do so. To fill this gap, we propose MLTE (Machine Learning Test and Evaluation, colloquially referred to as "melt"), a framework and implementation to evaluate ML models and systems. The framework compiles state-of-the-art evaluation techniques into an organizational process for interdisciplinary teams, including model developers, software engineers, system owners, and other stakeholders. MLTE tooling supports this process by providing a domain-specific language that teams can use to express model requirements, an infrastructure to define, generate, and collect ML evaluation metrics, and the means to communicate results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2022

Measuring AI Systems Beyond Accuracy

Current test and evaluation (T E) methods for assessing machine learni...
research
09/01/2020

Towards Evaluating Exploratory Model Building Process with AutoML Systems

The use of Automated Machine Learning (AutoML) systems are highly open-e...
research
04/27/2022

Prescriptive and Descriptive Approaches to Machine-Learning Transparency

Specialized documentation techniques have been developed to communicate ...
research
10/18/2017

Themis-ml: A Fairness-aware Machine Learning Interface for End-to-end Discrimination Discovery and Mitigation

As more industries integrate machine learning into socially sensitive de...
research
03/16/2021

MLOps Challenges in Multi-Organization Setup: Experiences from Two Real-World Cases

The emerging age of connected, digital world means that there are tons o...
research
10/06/2022

Why Should I Choose You? AutoXAI: A Framework for Selecting and Tuning eXplainable AI Solutions

In recent years, a large number of XAI (eXplainable Artificial Intellige...
research
07/02/2022

Firenze: Model Evaluation Using Weak Signals

Data labels in the security field are frequently noisy, limited, or bias...

Please sign up or login with your details

Forgot password? Click here to reset