Log In Sign Up

Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models

by   David Nigenda, et al.

With the increasing adoption of machine learning (ML) models and systems in high-stakes settings across different industries, guaranteeing a model's performance after deployment has become crucial. Monitoring models in production is a critical aspect of ensuring their continued performance and reliability. We present Amazon SageMaker Model Monitor, a fully managed service that continuously monitors the quality of machine learning models hosted on Amazon SageMaker. Our system automatically detects data, concept, bias, and feature attribution drift in models in real-time and provides alerts so that model owners can take corrective actions and thereby maintain high quality models. We describe the key requirements obtained from customers, system design and architecture, and methodology for detecting different types of drift. Further, we provide quantitative evaluations followed by use cases, insights, and lessons learned from more than 1.5 years of production deployment.


page 24

page 25

page 26

page 27


Monitoring and explainability of models in production

The machine learning lifecycle extends beyond the deployment stage. Moni...

Amazon SageMaker Clarify: Machine Learning Bias Detection and Explainability in the Cloud

Understanding the predictions made by machine learning (ML) models and t...

Concept for a Technical Infrastructure for Management of Predictive Models in Industrial Applications

With the increasing number of created and deployed prediction models and...

Adaptive Learning for Service Monitoring Data

Service monitoring applications continuously produce data to monitor the...

MLDemon: Deployment Monitoring for Machine Learning Systems

Post-deployment monitoring of the performance of ML systems is critical ...

A monitoring framework for deployed machine learning models with supply chain examples

Actively monitoring machine learning models during production operations...

ML Health: Fitness Tracking for Production Models

Deployment of machine learning (ML) algorithms in production for extende...