Better sampling in explanation methods can prevent dieselgate-like deception

01/26/2021
by   Domen Vreš, et al.
0

Machine learning models are used in many sensitive areas where besides predictive accuracy their comprehensibility is also important. Interpretability of prediction models is necessary to determine their biases and causes of errors, and is a necessary prerequisite for users' confidence. For complex state-of-the-art black-box models post-hoc model-independent explanation techniques are an established solution. Popular and effective techniques, such as IME, LIME, and SHAP, use perturbation of instance features to explain individual predictions. Recently, Slack et al. (2020) put their robustness into question by showing that their outcomes can be manipulated due to poor perturbation sampling employed. This weakness would allow dieselgate type cheating of owners of sensitive models who could deceive inspection and hide potentially unethical or illegal biases existing in their predictive models. This could undermine public trust in machine learning models and give rise to legal restrictions on their use. We show that better sampling in these explanation methods prevents malicious manipulations. The proposed sampling uses data generators that learn the training set distribution and generate new perturbation instances much more similar to the training set. We show that the improved sampling increases the robustness of the LIME and SHAP, while previously untested method IME is already the most robust of all.

READ FULL TEXT

page 8

page 19

page 24

research
04/04/2022

Event Log Sampling for Predictive Monitoring

Predictive process monitoring is a subfield of process mining that aims ...
research
11/21/2021

Explainable Software Defect Prediction: Are We There Yet?

Explaining the prediction results of software defect prediction models i...
research
06/29/2019

Privacy Risks of Explaining Machine Learning Models

Can we trust black-box machine learning with its decisions? Can we trust...
research
06/15/2021

S-LIME: Stabilized-LIME for Model Explanation

An increasing number of machine learning models have been deployed in do...
research
09/14/2021

Behavior of k-NN as an Instance-Based Explanation Method

Adoption of DL models in critical areas has led to an escalating demand ...
research
05/06/2023

Maintaining Stability and Plasticity for Predictive Churn Reduction

Deployed machine learning models should be updated to take advantage of ...

Please sign up or login with your details

Forgot password? Click here to reset