Foundation Model-oriented Robustness: Robust Image Model Evaluation with Pretrained Models

08/21/2023
by   Peiyan Zhang, et al.
0

Machine learning has demonstrated remarkable performance over finite datasets, yet whether the scores over the fixed benchmarks can sufficiently indicate the model's performance in the real world is still in discussion. In reality, an ideal robust model will probably behave similarly to the oracle (e.g., the human users), thus a good evaluation protocol is probably to evaluate the models' behaviors in comparison to the oracle. In this paper, we introduce a new robustness measurement that directly measures the image classification model's performance compared with a surrogate oracle (i.e., a foundation model). Besides, we design a simple method that can accomplish the evaluation beyond the scope of the benchmarks. Our method extends the image datasets with new samples that are sufficiently perturbed to be distinct from the ones in the original sets, but are still bounded within the same image-label structure the original test image represents, constrained by a foundation model pretrained with a large amount of samples. As a result, our new method will offer us a new way to evaluate the models' robustness performance, free of limitations of fixed benchmarks or constrained perturbations, although scoped by the power of the oracle. In addition to the evaluation results, we also leverage our generated data to understand the behaviors of the model and our new evaluation strategies.

READ FULL TEXT

page 8

page 24

page 29

research
06/02/2020

Interpretable Meta-Measure for Model Performance

Measures for evaluation of model performance play an important role in M...
research
05/24/2023

Evaluating NLG Evaluation Metrics: A Measurement Theory Perspective

We address the fundamental challenge in Natural Language Generation (NLG...
research
03/31/2023

DIME-FM: DIstilling Multimodal and Efficient Foundation Models

Large Vision-Language Foundation Models (VLFM), such as CLIP, ALIGN and ...
research
09/13/2023

Towards Reliable Dermatology Evaluation Benchmarks

Benchmark datasets for digital dermatology unwittingly contain inaccurac...
research
05/10/2023

An Empirical Study on the Robustness of the Segment Anything Model (SAM)

The Segment Anything Model (SAM) is a foundation model for general image...
research
05/31/2023

Evaluating Machine Learning Models with NERO: Non-Equivariance Revealed on Orbits

Proper evaluations are crucial for better understanding, troubleshooting...
research
03/16/2023

From MNIST to ImageNet and Back: Benchmarking Continual Curriculum Learning

Continual learning (CL) is one of the most promising trends in recent ma...

Please sign up or login with your details

Forgot password? Click here to reset