Robustness of Machine Learning Models Beyond Adversarial Attacks

by   Sebastian Scher, et al.

Correctly quantifying the robustness of machine learning models is a central aspect in judging their suitability for specific tasks, and thus, ultimately, for generating trust in the models. We show that the widely used concept of adversarial robustness and closely related metrics based on counterfactuals are not necessarily valid metrics for determining the robustness of ML models against perturbations that occur "naturally", outside specific adversarial attack scenarios. Additionally, we argue that generic robustness metrics in principle are insufficient for determining real-world-robustness. Instead we propose a flexible approach that models possible perturbations in input data individually for each application. This is then combined with a probabilistic approach that computes the likelihood that a real-world perturbation will change a prediction, thus giving quantitative information of the robustness of the trained machine learning model. The method does not require access to the internals of the classifier and thus in principle works for any black-box model. It is, however, based on Monte-Carlo sampling and thus only suited for input spaces with small dimensions. We illustrate our approach on two dataset, as well as on analytically solvable cases. Finally, we discuss ideas on how real-world robustness could be computed or estimated in high-dimensional input spaces.


page 19

page 21


Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models

Many machine learning algorithms are vulnerable to almost imperceptible ...

Real-world-robustness of tree-based classifiers

The concept of trustworthy AI has gained widespread attention lately. On...

Efficient Estimation of the Local Robustness of Machine Learning Models

Machine learning models often need to be robust to noisy input data. The...

Achieving Robustness in the Wild via Adversarial Mixing with Disentangled Representations

Recent research has made the surprising finding that state-of-the-art de...

Quantile-constrained Wasserstein projections for robust interpretability of numerical and machine learning models

Robustness studies of black-box models is recognized as a necessary task...

CC-Cert: A Probabilistic Approach to Certify General Robustness of Neural Networks

In safety-critical machine learning applications, it is crucial to defen...

False perfection in machine prediction: Detecting and assessing circularity problems in machine learning

Machine learning algorithms train models from patterns of input data and...

Please sign up or login with your details

Forgot password? Click here to reset