Machine learning is increasingly present in today’s life since the arrival of the first Convolutionnal Neural Networks (CNN) . Performances achieved by such networks are impressive and have led to their development in many applications, such as smart vehicles. Despite these performances, errors still exist and can have dramatic consequences, especially for applications where lives are at stake. Furthermore, in medical fields, for example, it is desirable not only to have a final classification result but also to know the causes of the decision. For all these reasons, more and more research is being conducted on DNN explanation, as mentioned in recent literature reviews , , .
To our knowledge, all these methods try to explain DNN trained for classification task: the goal is to find out which elements of the input led to the decision of the network. Unfortunately, no ground truth exists. Therefore, network-explanation results are only evaluated by looking at the produced maps and comparing them to what a human operator believes to be correct. Without an objective tool that quantifies results, it is difficult to compare the results of different methods.
In this article, we propose to build an experimental setup, associated with a ground truth, to quantify explanation results of networks. This setup aims at estimating signals quality: we created a database of ideal signals to which errors were added at random positions. A note is associated to each signal, depending on the distance of an example to its ideal version. A CNN is trained in regression to find this note. Then, the network explanation aims at determining which part of the input (temporal position and dimension) occasioned the score provided by the network. Such a setup with a ground truth enables us to compare quantitatively different DNN-interpretability algorithms.
In order to determine time-steps and dimensions of the input signal where errors occurred, we do a gradient descent that transforms the input to a signal that has the best possible note. This gradient descent enables us to have a gradient according to the input signal. Such a strategy is not new and it is known that these gradients are very noisy , . During our experiments, we have found that these gradients vary a lot depending on training and weight initialisation. Actually, training the same model several times on the same database leads, for a given input example, to gradients that change a lot from one model to another. Some model gradients find some errors but not others, and some are very noisy while others are not, etc. We chose to take advantage of these variations to estimate an ”accurate gradient” from all the models. The proposed method, named Accurate GRAdient (AGRA), consists in averaging the gradients generated by the different trainings and weight initialization for the same input signal.
Thanks to our experimental setup, we quantitatively compare AGRA with several gradients-based methods and show its efficiency. Moreover, AGRA can be combined to other gradient-based methods to improve their performance.
Thus, two main contributions are proposed in this article. First, we develop an experimental database that allows to qualitatively and quantitatively compare DNN explanation methods. Second, we introduce a new DNN explanation technique AGRA, based on gradients, that outperforms state-of-the-art methods.
Ii Related work
Ii-a Explaining Deep Neural Networks methods
Several methods exist in the literature to explain DNN. Their goal is to find the contribution of each input feature to the output and thus, to produce attribution maps. Methods can be grouped into three main categories: class activation based approaches, perturbation based approaches and gradient based approaches.
Ii-A1 Class activation based approaches
Methods such as Class Activation Map (CAM) , Gradient-weighted Class Activation Mapping (Grad-CAM) , or Uncertainty based Class Activation Maps (U-CAM)  propose to generate Class Activation Maps that highlight pixels of the image the model used to make the classification decision. The goal is thus to produce maps similar to human attention regions. These maps are estimated in a multi-class classification context and are class-discriminative.
Ii-A2 Perturbation based approaches
The idea of these approaches is to disturb some portions of the input image and look at their influence on the output. Work in 
consists in systematically occulting different portions of the input image with a grey square, and monitoring the output of the classifier. As the probability of the correct class drops significantly when the object is occluded, this technique localizes objects in the scene. Another approach, based on perturbation, proposed by Ribeiroet al. , is the Local Interpretable Model-Agnostic Explanation (LIME). A model is explained by perturbing the input and constructing a local linear model that can be interpreted. Thus, LIME makes local approximations of the complex decision surface.
Ii-A3 Gradient based approaches
Simonyan et al.  proposed to compute sensitivity maps as the gradient of the output according to input pixels in a classification task. If is the score function of the classification network for the class and input image , then sensitivity maps are defined as:
By intuition, important gradient values correspond to locations in the image that have a strong influence on the output.
In practice, these sensitivity maps are very noisy. A first solution to improve them is to change the back-propagation algorithm. Thus, deconvolution networks 
and Guided Backpropagation propose to discard negative gradient values during the back-propagation step. The idea is to keep only entries that will have a positive influence on the score.
Another problem with gradient-based techniques is that the score function may saturate for important input characteristics 
. Thus, the function may be flat (but important) around these inputs and thus, has a small gradient. Some methods address this problem by computing the global importance of each pixel. Thus, DeepLIFT (Deep Learning Important FeaTures)
decomposes the output prediction by back-propagating contributions of all neurons in the network to every feature of the input.
Layer-wise relevance propagation (LRP)  uses a pixel-wise decomposition to understand the contribution of each single pixel of the input image to the score function . A propagation rule, applied from the output back to the input, distributes class relevance found at a given layer onto the previous layer. It leads to a heatmap that highlights pixels responsible for the predicted class.
Instead of computing the gradients of the output according to the input pixels , Sundararajan et al.  integrate the gradients along a path from a baseline to the input . The integrated gradient, for the dimension of the input is defined as:
where is the gradient of according to along the dimension.
During computation, the integral is approximated via a summation: gradients at the N points lying on the straight line from the baseline to the input , are added. Integrated gradients add up to the difference between the outputs at and the baseline . Thus, if the baseline has a near-zero score, integrated gradients form an attribution map of the prediction output .
Given the rapid fluctuations of the gradient for an input image , it is less meaningful than a local average of gradient values. Thus, SmoothGrad  proposes to create an improved sensitivity maps based on a smoothing of with a Gaussian kernel. As the direct computation of such a local average in a high-dimensional input space is intractable, Smilkov et al. compute a stochastic approximation by taking random samples in a neighborhood of the input and averaging the resulting sensitivity maps:
where N is the number of noised inputs, and is a Gaussian noise with a mean and a standard deviation.
In this article, we also propose to use a gradient-based approach and to denoise the so-obtained gradient. The proposed approach, based on several trainings, can be combined to other gradient-based methods to improve their performances.
Iii AGRA method to obtain accurate gradient
In this work, we first propose to design an experimental setup to explain DNN. Then, we introduce a new method allowing to denoise the gradient of the output according to an input using several trainings of the same DNN.
Iii-a Designing an experimental setup
A problem often encountered with DNN explanation algorithms is the lack of ground truth. It is therefore difficult to quantitatively estimate the performance of such algorithms. To address this issue, we design a setup where this ground truth is available. This setup is composed of 2D temporal signals. Both dimensions are generated using sinusoids with different lengths, to which a small Gaussian noise has been added. These signals represent ideal signals in the database. Then we artificially create perturbations in both dimensions by adding high-frequency Gaussians. The number of perturbations varies uniformly between 0 and 8 and their position and the dimension where they appear are also drawn according to a uniform law.
A score, re-scaled between and , is then given to each of these signals. This score is based on the Mean Square Error (MSE) between the signal without perturbation and the disrupted signal. is attributed to ideal signals while score gets close to , when many perturbations are present. 1000 signals are thus generated, 750 are used for training and 250 for testing, drawn according to a uniform law.
The goal of the network will then be to regress the score of each input signal while the goal of the DNN explanation will be to find time-steps and dimensions of the errors. Three examples of signals extracted from the database are presented in Figure 1.
Even if we are working on synthetic examples with a ground truth regarding the DNN explanation, this setup corresponds to a real application that aims to determine the quality of gestures in sports  or surgical context , for instance. In addition to assigning a score, DNN explanation will make it possible to determine where gestures are poorly carried out.
Iii-B The AGRA method
First a CNN is trained to regress the scores with a MSE loss between the predicted scores and the scores of the ground truth : . Then, for DNN explanation, a gradient of the output according to the input example , as that proposed in , is computed, without changing the weights of the networks. It is used to change the input so that its note increases. As the goal is to find differences difference between ideal signals and perturbed ones, the loss used for gradient back-propagation is the MSE between the predicted score and the optimal note ( in our case): . Several iterations are done until the ideal note is reached as explained in Algorithm 1 where is the learning rate and is the tolerance: loop stops when the loss is below .
Unfortunately, and as stated before, this gradient is very noisy , . Moreover, during our experiments, we observed that it depends significantly on weights initialisation and training of the network. Thus, even if two different trainings lead to similar regression scores, gradients are highly variable. Two examples of gradient can be found on Figure 2.
We decided to take advantage of these variations and average gradients of different models with different trainings, to obtain a noise-reduced and more accurate gradient. So, we trained times the same network to obtain models. Let , the gradient of the output according to the input, obtained with model , as described in the algorithm 1. AGRA is then obtained as described in Algorithm 2. AGRA method needs several trainings of the same model, which is computationally expensive. However, as shown in Figure 2, the so-obtained gradients are more accurate. Moreover, they no longer depend on training and initialisation, which was the case before when either good or bad gradients were obtained.
Iv Experimental results
For all methods involved in this section, we use the loss functionpreviously defined to compute gradients.
Iv-a Training procedure
The regression network consists of four temporal convolutional layers with filters of size , with no bias added. Each of them is followed by a pooling layer with size . Two fully connected layers with and neurons end the neural network with between them a dropout layer with a probability, with no bias. The network is learnt with adam optimizer  and a learning rate, for epochs. The network regresses a score between and and is trained times to obtain models. The mean MSE across the models, on the test set, is of 0.619 with a standard deviation of 0.089. So, during prediction, these models have a similar behavior.
Iv-B Qualitative results
Firstly, we present qualitative results of the five following methods:
Integrated gradient . As the proposed network has no bias, the baseline is fixed to a zero signal with the same length than . In these conditions, the score of the baseline is and integrated gradients can been interpreted as an attribution map of the prediction output . Integrated gradients have already been multiplied by the input as explained in equation 3.
The AGRA method with trained models.
As shown in Figure 3, classical gradients (GRAD) are noisy and do not lead to clear and easy to interpret results, since peaks at perturbation locations are sometimes too thin and small and can be considered as noise. Furthermore, multiplying these noisy gradients with the input only makes the results worse. Indeed, interesting peaks are enhanced but global results appear noisier than before. Moreover, the sign of the gradient, which gives information on the direction of the error, is lost due to this multiplication. Using smooth gradient instead of classical gradient gives better qualitative results with considerably less noise than before. However, noise is still present and the results are again difficult to interpret. Moreover, the magnitude of the gradient is often under-estimated. Integrated gradients are very noisy and have peaks at undisturbed positions, making them very difficult to interpret. As they are multiplied by the input signal, the sign of the gradient is lost. As shown in Figure 3, less noisy and more accurate results are achieved with AGRA method. Gradients actually highlight the locations corresponding to perturbations and have the correct direction to reconstruct the ideal signal.
Iv-C Quantitative results
To compare methods more thoroughly, giving quantitative results is crucial. Since ground truth is available for each example, it is possible to compute ideal gradients (the difference between perturbed signals and ideal ones) and compare them with results obtained with the different methods. Two metrics are used to make this comparison:
Mean Squared Error (MSE) between the signal without errors and the reconstructed signal obtained thanks to the gradients. This metric cannot be used for methods such as GRADInput or Integrated Gradient, since their goal is only to highlight important time-steps and not to reconstruct a perfect signal.
Pearson correlation coefficient between the ideal gradient and the gradient obtained with the different methods. To avoid penalising methods, that do not manage the signs (GRADInput and Integrated Gradient), this coefficient is computed between the norms of both ideal gradient and gradient from the methods.
The training examples have been averaged to obtain these metrics. Moreover, for GRAD, GRADInput, Smooth Grad and Integrated Gradient, metrics have been computed on the 50 trained models and afterwards averaged.
|Methods||Mean Squared Error|
|Smooth Grad ||7.85|
Table I presents the MSE obtained with different methods. As a reminder, an estimated gradient fitting perfectly to the ground truth one would correspond to a 0 MSE. Both GRAD and Smooth Grad methods are noisy. Moreover, Smooth Grad does not keep gradient magnitude. Thus, AGRA method outperforms both of these methods according to MSE. AGRA is therefore the most suitable method for signal reconstruction.
|GRADxInput , ||0.82|
|Smooth Grad ||0.79|
|Integrated Gradient ||0.55|
As shown in Table II, Pearson correlation coefficients vary between 0.55 and 0.94. As Pearson correlation coefficients are standardised (the correlation is divided by the standard deviation of both gradients), they can be estimated in a meaningful way for each method, even when the gradient is multiplied by the input. The best results are obtained with our proposed method, which confirms the previous qualitative study and proves that this method gives better results than other state-of-the-art methods.
Table III gives the Pearson coefficients obtained by keeping the sign of the gradients when calculating the correlation: the correlation is estimated for each of the two dimensions and then averaged. Using this metric, only Grad and Smooth Grad methods can be evaluated since for the other two, multiplying by the input will change signs of gradient and results will not be exploitable. AGRA is again the most efficient method, even if Pearson coefficient do not take into account gradient magnitude, which does not penalize Smooth Grad as the MSE did.
|Smooth Grad ||0.66|
To study AGRA behaviour, it is interesting to show the evolution of both MSE and Pearson Correlation, according to the number of averaged models (Figure 4). As stated before, gradients are model-dependant. So, MSE, Pearson coefficient and thus the explanation of the network change a lot according to the model. More particularly, it can been seen in Figure 4 that the two first training lead to bad results while the following ones, before the tenth, have a good explanation. Let’s remember that the different model changes just by the initialization of the weights. They all have nearly the same regression scores but their gradients change strongly. It is therefore impossible to define a priori the models that lead to a good quality gradient. So, in Figure 4, the is important at the beginning and then decreases before stabilizing. Averaging the gradients obtained by 20 or more models produces good explanation results, independent of learning. The same reasoning can be applied to Pearson correlation coefficient.
Iv-D AGRA combined with other methods
As stated before, it is possible to combine our approach with different state-of-the-art methods, such as GRADInput, Smooth Grad and Integrated gradient, in order to improve both qualitative and quantitative results.
As shown in Figure 5, using the average of models for all methods greatly improves their performances and especially denoises results of every methods. Quantitative results are all improved using AGRA as shown in Table IV, for both Pearson correlation and MSE. This shows that even if this method is computationally intensive, obtained results are really improved compared with state-of-the-art.
|Integrated Gradient ||0.55||NA|
In this paper a new approach to explain neural network decisions has been presented, with a specific experimental setup dedicated to neural network explanation. Indeed, the lack of ground truth for network explanation often only allows a qualitative comparison of different approaches. The design of a synthesis device, devoted to this task, enables quantitative comparisons.
In addition to this new database and experimental setup, a novel approach for network decision explanation has been proposed. Indeed, by observing that the explanation strongly depends on the learning of the model, we proposed to carry out several trainings and then to average explanations provided by each of them. It has been shown that this technique improves both qualitative results - indeed explanations are less noisy - and quantitative results, with better scores for both Pearson correlation and MSE of reconstructed signals. However the drawback of this method, is the high computation cost, since many models need to be trained.
In the future, we plan to extend this approach to models learned in classification to see if the same conclusions can be drawn.
-  (2015) On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one 10 (7). Cited by: §II-A3, §II-A3, 2nd item, TABLE II, TABLE IV.
Explainable artificial intelligence: a survey. In 2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO), pp. 0210–0215. Cited by: §I.
Evaluating surgical skills from kinematic data using convolutional neural networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 214–221. Cited by: §III-A.
Why are saliency maps noisy? cause of and solution to noisy saliency maps.
2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 4149–4157. Cited by: §I, §III-B.
-  (2015) Adam: a method for stochastic optimization. CoRR abs/1412.6980. Cited by: §IV-A.
-  (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §I.
-  (2020) Fine-tuning siamese networks to assess sport gestures quality. In Proceedings of the 15th International Conference on Computer Vision Theory and Applications, Cited by: §III-A.
-  (2019) U-cam: visual explanation using uncertainty based class activation maps. In Proceedings of the IEEE International Conference on Computer Vision, pp. 7444–7453. Cited by: §II-A1.
-  (2016) ” Why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144. Cited by: §II-A2.
-  (2017) Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296. Cited by: §I.
-  (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pp. 618–626. Cited by: §II-A1.
-  (2017) Not just a black box: learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3145–3153. Cited by: §II-A3, §II-A3, §II-A3, 2nd item, TABLE II, TABLE IV.
-  (2013) Deep inside convolutional networks: visualising image classification models and saliency maps. CoRR abs/1312.6034. Cited by: §II-A3, §III-B, 1st item, TABLE I, TABLE II, TABLE III, TABLE IV.
-  (2017) Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825. Cited by: §I, §II-A3, §II-A3, §III-B, 3rd item, TABLE I, TABLE II, TABLE III, TABLE IV.
-  (2016) Gradients of counterfactuals. arXiv preprint arXiv:1611.02639. Cited by: §II-A3.
-  (2017) Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3319–3328. Cited by: §II-A3, §II-A3, 4th item, TABLE II, TABLE IV.
-  (2014) Visualizing and understanding convolutional networks. In European conference on computer vision, pp. 818–833. Cited by: §II-A2, §II-A3.
-  (2018) Visual interpretability for deep learning: a survey. Frontiers of Information Technology & Electronic Engineering 19 (1), pp. 27–39. Cited by: §I.
Learning deep features for discriminative localization. In
Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921–2929. Cited by: §II-A1.