Explainable Artificial Intelligence: How Subsets of the Training Data Affect a Prediction
There is an increasing interest in and demand for interpretations and explanations of machine learning models and predictions in various application areas. In this paper, we consider data-driven models which are already developed, implemented and trained. Our goal is to interpret the models and explain and understand their predictions. Since the predictions made by data-driven models rely heavily on the data used for training, we believe explanations should convey information about how the training data affects the predictions. To do this, we propose a novel methodology which we call Shapley values for training data subset importance. The Shapley value concept originates from coalitional game theory, developed to fairly distribute the payout among a set of cooperating players. We extend this to subset importance, where a prediction is explained by treating the subsets of the training data as players in a game where the predictions are the payouts. We describe and illustrate how the proposed method can be useful and demonstrate its capabilities on several examples. We show how the proposed explanations can be used to reveal biasedness in models and erroneous training data. Furthermore, we demonstrate that when predictions are accurately explained in a known situation, then explanations of predictions by simple models correspond to the intuitive explanations. We argue that the explanations enable us to perceive more of the inner workings of the algorithms, and illustrate how models producing similar predictions can be based on very different parts of the training data. Finally, we show how we can use Shapley values for subset importance to enhance our training data acquisition, and by this reducing prediction error.
READ FULL TEXT