Patch Shortcuts: Interpretable Proxy Models Efficiently Find Black-Box Vulnerabilities

04/22/2021
by   Julia Rosenzweig, et al.
0

An important pillar for safe machine learning (ML) is the systematic mitigation of weaknesses in neural networks to afford their deployment in critical applications. An ubiquitous class of safety risks are learned shortcuts, i.e. spurious correlations a network exploits for its decisions that have no semantic connection to the actual task. Networks relying on such shortcuts bear the risk of not generalizing well to unseen inputs. Explainability methods help to uncover such network vulnerabilities. However, many of these techniques are not directly applicable if access to the network is constrained, in so-called black-box setups. These setups are prevalent when using third-party ML components. To address this constraint, we present an approach to detect learned shortcuts using an interpretable-by-design network as a proxy to the black-box model of interest. Leveraging the proxy's guarantees on introspection we automatically extract candidates for learned shortcuts. Their transferability to the black box is validated in a systematic fashion. Concretely, as proxy model we choose a BagNet, which bases its decisions purely on local image patches. We demonstrate on the autonomous driving dataset A2D2 that extracted patch shortcuts significantly influence the black box model. By efficiently identifying such patch-based vulnerabilities, we contribute to safer ML models.

READ FULL TEXT

page 4

page 5

page 7

research
11/26/2018

Please Stop Explaining Black Box Models for High Stakes Decisions

There are black box models now being used for high stakes decision-makin...
research
06/24/2022

On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods

Machine Learning (ML) models now inform a wide range of human decisions,...
research
06/22/2018

xGEMs: Generating Examplars to Explain Black-Box Models

This work proposes xGEMs or manifold guided exemplars, a framework to un...
research
03/27/2022

How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

The lack of adversarial robustness has been recognized as an important i...
research
01/13/2021

White-Box Analysis over Machine Learning: Modeling Performance of Configurable Systems

Performance-influence models can help stakeholders understand how and wh...
research
08/04/2020

Making Sense of CNNs: Interpreting Deep Representations Their Invariances with INNs

To tackle increasingly complex tasks, it has become an essential ability...
research
09/09/2021

Detecting and Mitigating Test-time Failure Risks via Model-agnostic Uncertainty Learning

Reliably predicting potential failure risks of machine learning (ML) sys...

Please sign up or login with your details

Forgot password? Click here to reset