Evaluating Machine Learning Models with NERO: Non-Equivariance Revealed on Orbits

05/31/2023
by   Zhuokai Zhao, et al.
0

Proper evaluations are crucial for better understanding, troubleshooting, interpreting model behaviors and further improving model performance. While using scalar-based error metrics provides a fast way to overview model performance, they are often too abstract to display certain weak spots and lack information regarding important model properties, such as robustness. This not only hinders machine learning models from being more interpretable and gaining trust, but also can be misleading to both model developers and users. Additionally, conventional evaluation procedures often leave researchers unclear about where and how model fails, which complicates model comparisons and further developments. To address these issues, we propose a novel evaluation workflow, named Non-Equivariance Revealed on Orbits (NERO) Evaluation. The goal of NERO evaluation is to turn focus from traditional scalar-based metrics onto evaluating and visualizing models equivariance, closely capturing model robustness, as well as to allow researchers quickly investigating interesting or unexpected model behaviors. NERO evaluation is consist of a task-agnostic interactive interface and a set of visualizations, called NERO plots, which reveals the equivariance property of the model. Case studies on how NERO evaluation can be applied to multiple research areas, including 2D digit recognition, object detection, particle image velocimetry (PIV), and 3D point cloud classification, demonstrate that NERO evaluation can quickly illustrate different model equivariance, and effectively explain model behaviors through interactive visualizations of the model outputs. In addition, we propose consensus, an alternative to ground truths, to be used in NERO evaluation so that model equivariance can still be evaluated with new, unlabeled datasets.

READ FULL TEXT

page 2

page 6

page 7

page 8

page 9

research
08/09/2023

A Unified Interactive Model Evaluation for Classification, Object Detection, and Instance Segmentation in Computer Vision

Existing model evaluation tools mainly focus on evaluating classificatio...
research
07/09/2021

A Topological-Framework to Improve Analysis of Machine Learning Model Performance

As both machine learning models and the datasets on which they are evalu...
research
03/31/2023

Evaluation Challenges for Geospatial ML

As geospatial machine learning models and maps derived from their predic...
research
06/28/2019

Programming with Timespans in Interactive Visualizations

Modern interactive visualizations are akin to distributed systems, where...
research
05/19/2023

Where does a computer vision model make mistakes? Using interactive visualizations to find where and how CV models can improve

Creating Computer Vision (CV) models remains a complex and taxing practi...
research
10/13/2021

AI Total: Analyzing Security ML Models with Imperfect Data in Production

Development of new machine learning models is typically done on manually...
research
08/21/2023

Foundation Model-oriented Robustness: Robust Image Model Evaluation with Pretrained Models

Machine learning has demonstrated remarkable performance over finite dat...

Please sign up or login with your details

Forgot password? Click here to reset