## 1 Introduction

Due to the rise of deep learning [LeCun-Nature-2015] in recent years, scientists and engineers have developed solutions based on deep learning to solve almost every machine learning task in production-ready systems. While deep learning models obtain impressive accuracy levels [Devlin-NAACL-2019, He-CVPR-2016, Krizhevsky-NIPS-2012, Ren-NIPS-2015], surpassing even human-level performance for many tasks [Cozma-ACL-2018, Georgescu-Access-2019], their inherent complexity transforms them into opaque decision systems. Critical processes that deal with potentially sensitive information in areas such as finance, medicine, security and justice have become essentially black boxes, with the underlying logic being too complex even for data scientists and inaccessible to the end-users. Deep learning models require training data usually generated and annotated by humans, thus containing various biases, including discriminatory views on race and gender. Hence, models trained on biased data will inherit these biases and, in turn, will make decisions that are unfair or socially unacceptable [Caliskan-Science-2017]

. Another potential problem of such highly complex decision systems is the chance of inadvertently making correct decisions, but for the wrong reasons. A popular example here is an image of a wolf being classified correctly, but only because of the snowy background

[Ribeiro-KDD-2016]. While this is a harmless example, the same kind of decisions resulting from spurious correlations from large amounts of data could potentially have a great negative impact on human lives. In this context, explainable AI, a field that studies how artificial intelligence (AI) methods and techniques can be understood by human experts, gained a lot of attention recently. While there are many types of explanations that an explanatory method could provide

[Adadi-Access-2018, Guidotti-CS-2018], including rule extraction and outcome prediction, we chose to focus on explanation by exemplar generation. Prototypical examples (exemplars), which can describe a complex underlying data distribution, can offer meaningful insights about the behavior of a model, when a simple explanation is hard to extract. Prototype selection methods [Bien-AAS-2011, Chen-NeurIPS-2019, Gurumoorthy-ICDM-2019, Mahendran-CVPR-2015, Yeh-NIPS-2018] return examples that are representative for a set of similar instances. An exemplar can be one of the instances observed in the training data set [Bien-AAS-2011, Gurumoorthy-ICDM-2019, Yeh-NIPS-2018], or it can be an artificially-generated example in the data space [Chen-NeurIPS-2019, Mahendran-CVPR-2015]. Current explainable AI approaches mainly consider the glass-box scenario of highly-complex deep learning models, with no restriction towards accessing the models’ weights. For example, Nguyen et al. [Nguyen-NIPS-2016]applied gradient descent to back-propagate through the model in order to synthesize the preferred inputs for neurons, which would not be possible without knowing the weights. We consider the more realistic case in which we have no information about the weights or other internal components of the model. Our framework is only allowed to inspect the input format and the output predictions. This strict definition of black-box models enables us to explain just about any machine learning model, not only deep learning models trained with gradient descent. In other words, our framework is

*model-agnostic*. While related methods obtain exemplars for image classification models [Guidotti-ECML-2019, Nguyen-NIPS-2016], we propose a more generic framework that it is not tied to a particular data modality – be it image, text, structured or tabular. Our framework is also agnostic with respect to the underlying generative model used to synthesize exemplars – be it a Variational Auto-Encoders (VAE) [Kingma-ICLR-2014] or a Generative Adversarial Network (GAN) [Goodfellow-NIPS-2014]. To this end, we consider our framework as

*generic*. To our best knowledge, we are the first to propose a generic and model-agnostic explainable AI framework to synthesize exemplars that exhibit a high response with respect to the output of a black-box model. We exploit the structured latent space of the underlying generative model to progressively search for latent codes that can accurately explain a particular class or combination of classes, as learned by the model. We employ a novel evolutionary strategy with momentum updates as our search policy, as it was proven that evolutionary algorithms [Salimans-ArXiv-2017] represent an efficient way for black-box optimization, relieving us from the need to propagate gradients. Our framework is illustrated in Figure 1.

We conduct experiments with two generative models, VAEs and GANs, to synthesize exemplars for various data formats, namely image, text and tabular, demonstrating that our framework is generic. We also employ our prototype synthetization framework on various black-box models, e.g. Random Forest or neural networks, for which we only know the input and the output formats, showing that it is model-agnostic. We present experiments showing that our framework can also generalize to classes unseen by the generator. Moreover, we compare our framework with a model-dependent approach

[Nguyen-NIPS-2016] based on gradient descent optimization, demonstrating that our framework converges to equally-good exemplars in a shorter amount of time.## 2 Related Work

Explainable AI methods [Adadi-Access-2018, Guidotti-CS-2018] for black-box models have gained significant attention in recent years, as bias in data and model training [Bolukbasi-NIPS-2016, Caliskan-Science-2017] have resulted in regulatory actions from the European Union [Goodman-AIM-2017], restricting the use of decision-making machine learning models without an explanatory component. Therefore, explainable AI is an area of research of utter importance, with open issues ranging from thorough testing and regulatory compliance [Mittelstadt-FAccT-2019] to finding what kind of explanations are best suited to answer questions of fairness. Explainable AI methods can be classified into different taxonomies based on various criteria, e.g. as global or local methods, as model-specific or model-agnostic methods and so on [Adadi-Access-2018, Guidotti-CS-2018]. We hereby focus on methods that are closer to our own, i.e. on model-agnostic or exemplar-based methods. We note that deep neural networks are commonly regarded as complex models, their decisions being hard to understand due to the hierarchical non-linear structure. Local explainable AI methods [Karpathy-ICLR-2016, Li-NAACL-2016, Lundberg-NIPS-2017, Mullenbach-NAACL-2018, Ribeiro-KDD-2016, Selvaraju-ICCV-2017, Wiegreffe-EMNLP-2019, Zhou-CVPR-2016] deal with explaining a particular decision, i.e. the decision provided for a certain input example. Some of these methods [Lundberg-NIPS-2017, Ribeiro-KDD-2016] employ a directly-interpretable surrogate model that is trained on the vicinity of an input sample, by modifying or occluding input features. Access to the network’s weights enables the back-propagation of gradients, leading to saliency-based methods such as CAM [Zhou-CVPR-2016], Grad-CAM [Selvaraju-ICCV-2017] and Grad-CAM++ [Chattopadhay-WACV-2018]. Unlike these local explainable AI methods, we propose a framework that does not look inside the models, i.e. the models are complete black boxes. Furthermore, our approach provides generic (non-local) explanations by synthesizing exemplars not tied to a certain data sample. Exemplar-based methods [Bien-AAS-2011, Chen-NeurIPS-2019, Gurumoorthy-ICDM-2019, Mahendran-CVPR-2015, Yeh-NIPS-2018], which offer a convenient way to communicate meaningful insights about the behavior of a model in situations where a direct explanation is hard to extract, are more closely related to our framework. Some exemplar-based methods select prototypical examples from the training data set [Bien-AAS-2011, Gurumoorthy-ICDM-2019, Yeh-NIPS-2018]. For example, Gurumoorthy et al. [Gurumoorthy-ICDM-2019] proposed a method aimed at describing the data distribution through case-based reasoning, while Chen et al. [Chen-NeurIPS-2019] presented an approach that selects relevant samples from the data set that have contributed to a decision of the model. Different from methods selecting exemplars from the data set, we introduce a framework that generates realistic examples without having access to the data set used to train the black box. Synthesizing artificial (unrealistic) examples that provide maximal responses for a particular network component can shed light on the preferences and biases of a trained model. Indeed, methods [Mahendran-IJCV-2016, Simonyan-ICLR-2014]

of visualizing a convolutional neural network represent a popular way of understanding its behavior. However, such methods are based on different variations of gradient ascent, requiring access to the internal weights of the model. Unlike such methods

[Mahendran-IJCV-2016, Simonyan-ICLR-2014], we can produce realistic examples while treating the model as a black box. We identified two works [Guidotti-ECML-2019, Nguyen-NIPS-2016] that are very closely related to our approach. Our method is similar to that of Nguyen et al. [Nguyen-NIPS-2016] because our method, as much as theirs, requires a deep generative model to synthesize realistic images. Without a deep generative model to act as a realistic image prior, there is a high chance that preferred inputs could end up being unrealistic. In [Nguyen-CVPR-2015], the authors already proved that neural networks output high responses to texture-like images that were generated using genetic algorithms, with little to no resemblance to natural images. Nguyen et al.

[Nguyen-NIPS-2016] applied gradient descent to back-propagate through the model in order to synthesize preferred inputs for neurons. Being base on gradient descent, their method requires access to the model’s weights. Different from Nguyen et al. [Nguyen-NIPS-2016], we consider the more realistic case in which we have no information about the weights or other internal components of the model. Our framework is only allowed to inspect the input format and the output predictions. This strict definition of black-box models enables us to explain just about any machine learning model, not only deep learning models trained with gradient descent. Without access to the gradients, our method generates exemplars through a novel evolutionary strategy with momentum updates. Focusing on image classification, Guidotti et al. [Guidotti-ECML-2019] presented an approach to explain the decisions of black-box models for a given input sample. Different from Guidotti et al. [Guidotti-ECML-2019], we show that our method is applicable to different data types, namely to images, text samples and tabular data. We also show that our method works with various generators, namely VAEs and GANs. While Guidotti et al. [Guidotti-ECML-2019]focus on explaining single instances, we focus on explaining output class probabilities, i.e. our exemplars are not tied to input data samples. All in all, we consider that there are significant differences between our method and that of Guidotti et al.

[Guidotti-ECML-2019].## 3 Method

Given a black-box classification model and a generative model able to sample from a data set included in , we aim to traverse the latent space of using a gradient-free optimization method, namely an evolutionary strategy with momentum updates, in order to synthesize exemplars for which provides a certain desired output . Here, is the number of classes, is the size of the embedding space of the generator, and is the data space, which depends on the input data type, e.g. for images , where and are the height and the width of an input image, respectively. In our framework, we impose no restrictions upon the prediction model . We require no access or knowledge of the internal structure of , i.e.

is a black box. As generator, we can use any model that takes as input a noise vector

and outputs a corresponding data sample, including Variational Auto-Encoders, Generative Adversarial Networks and Auto-Regressive models. Given the target prediction , we optimize an encoding such that is optimally close to , i.e. . The objective for our optimization problem can be formally expressed as follows:(1) |

Nguyen et al. [Nguyen-NIPS-2016] employed gradient descent to optimize the objective defined in Eq. (1), by back-propagating though the classification model . However, we assume that access to the internal structure or the weights of the model is not granted, i.e. the model is a black box. Furthermore, we do not impose any architectural restrictions over the model, i.e. needs not be a neural network. Even if access to the analytical gradients is not provided due to the black-box nature of the model , one can still compute the numerical gradients and search for a that minimizes the objective defined in Eq. (1), using gradient descent. However, since belongs to an -dimensional space, computing the numerical gradients for each component in requires forward passes through the model , which is inefficient in comparison to our evolutionary approach. We show in our experiments that we can synthesize exemplars with a confidence greater than with less than model calls (forward passes) on average. Additionally, we show that our evolutionary approach provides better exemplars and converges faster than gradient descent, even when analytical gradients are available for the classification model, i.e. becomes a glass-box model as in [Nguyen-NIPS-2016].

We hereby propose a novel evolutionary strategy based on optimization with momentum updates. We note that momentum is incorporated into a standard evolutionary strategy, i.e. the novelty consists in adding momentum updates. Our strategy is formally described in Algorithm 1. In steps 17-22, our algorithm starts by sampling initial exemplars to form the initial population , such that . In step 20, each component of an exemplar

is sampled from an uniform distribution

over the interval . In the same time, we generate the set of momentum vectors associated to exemplars . The initial momentum vectors are zero vectors of components, thus having the same size as the exemplars in . After initializing the population, we perform a selection in step 23 by keeping the top (elite) exemplars (and associated momentum vectors) that minimize our fitness function. The selection is performed inside the*select*function defined in steps 7-15. The fitness of each exemplar in the current population

is computed in steps 8-11. Our fitness function is the sum squared error between the target output

and the predicted output for an exemplar :(2) |

where .
Until the fitness score of our least fit exemplar in becomes smaller than , we repeat steps 25-30. Inside the loop, each exemplar from the current population is duplicated and mutated times. The mutation, performed inside the *mutate*

function defined in steps 1-6, consists in adding a zero-centered Gaussian distributed velocity vector

to the exemplar . The mutation applied to the exemplar in step 5 can shift the new exemplar in the direction of the gradient. We note that, during exemplar selection, we will choose exemplars that minimize our fitness function defined in Eq. (2). Since the kept exemplars were likely shifted in the right direction during the previous mutation, we added a momentum component to the mutation operation, which leads to faster convergence. The momentum vector is added to the velocity in a weighted sum computed in step 4, inside the*mutate*function. The momentum is the previous perturbation (velocity) applied on the exemplar . The magnitude of the momentum with respect to the generated velocity vector is controlled through the momentum rate . The current population together with the mutated duplicates form a new population that passes through the selection process in step 30 of Algorithm 1. By selecting only the top exemplars from every generation, we ensure that only the mutations that brought improvements are kept. Hence, only relevant perturbations are accumulated inside the momentum associated to an exemplar. As we will show in the experiments, the momentum component brings an increase of

in convergence speed compared to the plain version (that does not use momentum), subject to using the same hyperparameters. When the fitness of the least fit exemplar in

goes under the threshold , we store the best exemplar in and return it as the output of our evolutionary algorithm.## 4 Experiments

### 4.1 Data Sets

Adult Data Set. For tabular data, we present experiments on the Adult Data Set [Kohavi-KDD-1996]. This is a binary classification data set for predicting the income of adults based on census information such as race, gender, marital status and level of education. It is composed of 48,842 samples with 14 features. FER 2013. For image synthetization, we used the Facial Expression Recognition (FER) 2013 [Goodfellow-ICONIP-2013] data set that is comprised of grayscale images of faces representing 7 different classes of emotion. FER 2013 contains 28,709 training images, 3,589 validation images and 3,589 test images. The samples have a wide range of attributes, as they vary in illumination conditions, pose, gender, race and age. Large Movie Review Dataset. We performed text synthetization experiments on the Large Movie Review Dataset [Maas-ACL-2011]. The training set contains 25,000 movie reviews for binary sentiment classification: positive or negative. The test set is similar in size. Each review is highly polarised, neutral or close-to-neutral samples being absent.

### 4.2 Experimental Setup

For Algorithm 1, we used the same hyperparameters in all experiments: the initial population size is , the number of selected exemplars is , the latent space boundary is , the number of mutations per exemplar is , the standard deviation used for mutations is and the momentum rate is . We present results with other hyperparameters in the supplementary. Setup for tabular data. To prove that our framework is truly model-agnostic, we employed a Random Forest (RF) classifier as the black box for the Adult Data Set, which attains an accuracy of on the test set, while being trained on half of the training set. In the pre-processing step, 4 of the 6 numerical features (except for capital-gain and capital-loss) were normalized and each of the 8 categorical features was passed through a different embedding layer, generating a vector of two components for each categorical feature. The concatenation of all these features () gives us the final representation for the data samples. For data generation, we trained a VAE [Eduardo-AISTATS-2020]

on the other half of the training set (not used to train the RF classifier). We have synthesized prototype examples with regards to the output class probabilities of the RF classifier. The architecture of the VAE starts with two fully-connected layers having 64 and 128 neurons with batch normalization and Rectified Linear Unit (ReLU) activations, respectively. Finally, an 8-neuron dense layer determines the means and the standard deviations for a 4-dimensional encoding. During the reconstruction phase, embeddings are passed through two dense layers with 128 and 64 neurons, respectively. For the final output, there is an additional layer for numerical columns and an individual softmax layer for each categorical column. The loss is comprised of an

-distance component for numerical features and a categorical cross-entropy component for each categorical feature. Setup for image data. For facial expression recognition, we used the VGG-16 [Simonyan-ICLR-2014b] architecture, which yields an accuracy of on the FER 2013 test set. As generators, we employed a Progressively Growing GAN [Karras-ICLR-2018] and a VAE trained with cyclical annealing [Fu-NAACL-2019]. The architectures and specifications for these networks are the ones specified in [Karras-ICLR-2018] and [Fu-NAACL-2019], respectively. We performed several experiments on this data modality. Firstly, we provide a quantitative comparison between exemplars synthesized using our framework and exemplars generated in the glass-box scenario, i.e. when access to the classifier structure and weights is available, as in [Nguyen-NIPS-2016]. Secondly, we provide a comparison of the convergence times of the two approaches, i.e. our evolutionary algorithm with momentum versus gradient descent (based on analytical gradients). In a set of preliminary trials, we noticed that the value of the gradient rapidly decreases within less than 5 iterations from values in the range of to values in the range of , which impedes the gradient descent optimization process. Therefore, we experimented using gradient descent with momentum. We measure converge times from two perspectives: the number of model calls until the generated sample is classified with a confidence greater than and the duration of the optimization process in seconds. Thirdly, we show that our method is able to generate exemplars when the generator and the model are trained on the same data samples and on different data samples. Additionally, we prove that our method has the ability to generalize to previously unseen classes. To this end, we train the generator on all classes except one, e.g.*surprised*, and successfully generate

*surprised*exemplars even though the generator has never seen a surprised face before. Setup for text data. Sentence generation from latent embeddings has been proposed in the past, both through VAEs [Bowman-CoNLL-2016, Chung-NIPS-2015] and through GANs [Rajeswar-RepL4NLP-2017]. In our approach, we used an LSTM VAE [Bowman-CoNLL-2016], with a latent dimension of 128 neurons and hidden size of 512 neurons for both encoder and decoder networks. We used GloVe embeddings [Pennington-EMNLP-2014]

for the tokens processed by the encoder. The generator was trained for 120 epochs, with Kullback–Leibler annealing to avoid posterior collapse. The black-box classifier is a simple bidirectional LSTM, with hidden size of 256 neurons, with word embeddings trained alongside the final layer. The model achieves

accuracy on the test set. The generator and the classifier are trained on disjoint training sets. Our method for manipulating text resembles those of [Hu-ICML-2017, Yang-NIPS-2018], since our classifier model acts as a sentiment discriminator. However, the classifier and generator networks are independent. Once they are established, we manipulate the generated text only by traversing the latent space.

### 4.3 Results on Tabular Data

Features | Low Income | Low Income | High Income | High Income |
---|---|---|---|---|

yellow!20Age | 28 | 19 | 48 | 38 |

Work Class | Private | Private | Private | Private |

Final weight | 315124 | 393950 | 105785 | 45519 |

yellow!20Education | 5th-6th | Some-college | Doctorate | Prof-School |

Educational-num | 0 | 10 | 19 | 15 |

Marital Status | Never-married | Never-married | Married | Married |

Occupation | Other-service | Other-service | Prof-speciality | Prof-speciality |

Relationship | Other-relative | Own-child | Husband | Husband |

Race | White | White | White | White |

yellow!20Gender | Male | Female | Male | Male |

Capital Gain | 0 | 0 | 0 | 0 |

Capital Loss | 0 | 0 | 0 | 0 |

yellow!20Hours per Week | 19 | 24 | 64 | 84 |

yellow!20Native Country | Mexico | United-States | United-States | United-States |

*low income*and two exemplars with

*high income*. Important features are highlighted in pale yellow.

Synthesized tabular exemplars are not only meaningful by themselves, but they additionally provide a clear picture of the model’s decision process. Even though the RF model is treated as a complete black-box, with no knowledge of its type or internal structure, we are able to deduce the model’s reasoning by observing the synthesized prototypes in Table 1. On the Adult Data Set, the features that influence the decision of the RF classifier seem to be the age, the level of education and the number of working hours per week. All *high-income* exemplars are older people, with significant academic achievements (typically PhD) and more than 40 working hours per week. On the other end, *low-income* exemplars have poor education, a young age and typically work part-time. Another type of *low-income* exemplars (not included in Table 1) features very old, retired and widowed people with poor education. Additionally, our analysis can reveal data set insights and model biases. In this scenario, the entirety of *high-income* exemplars are males born in the United States, while *low-income* exemplars are people born in Mexico. Hence, it seems that there is a bias towards classifying mexicans in the *low income* category, which can raise ethical concerns towards racism.

### 4.4 Results on Image Data

In the image synthetization scenario, we present the differences and strengths of our framework when compared to a gradient descent approach that works in the glass-box scenario, while keeping the black-box scenario for our framework. For the comparison, we run both exemplar synthetization algorithms for 1000 times, generating 1000 exemplars in total.

Method | Converged | Calls | Time |
---|---|---|---|

(count) | (average) | (seconds) | |

Gradient descent with momentum [Nguyen-NIPS-2016] | 955 | 378 | 3.77 |

Evolutionary strategy (ours) | 1000 | 323 | 0.14 |

Evolutionary strategy with momentum (ours) | 1000 | 263 | 0.12 |

Quantitative analysis. Considering the convergence results presented in Table 2, we notice that both methods are able to synthesize exemplars that are classified with almost 100% confidence. However, there is a significant difference between the two methods in terms of convergence. While our evolutionary strategy is able to converge each and every time, the gradient descent approach is highly dependant on its starting point. We observe that, for both GAN and VAE generators, the gradient descent optimization does not always converge. We found that, out of 1000 runs, in 45 of them the gradient descent with momentum method failed to synthesize an exemplar with over 95% confidence, i.e. the algorithm got stuck in a non-optimal solution. This statement holds true when using both GANs and VAEs as generators. Running time. We measured the convergence times of the two exemplar generation approaches on an NVidia GeForce RTX 2080 GPU with 8GB of RAM. In Table 2, we present the number of model calls required to generate samples classified with more than confidence by the classifier. We also present the amount of physical time required for convergence. We observe that our evolutionary strategy with momentum requires fewer model calls (forward passes) than the gradient descent with momentum. In terms of physical time, our evolutionary strategy is about faster than gradient descent. We note that the gradient descent considered here is based on analytical gradients, which is faster than using numerical gradients. The remarkable difference in favor of our method can be explained by the following two factors: our evolutionary strategy is able to make model calls in batches and it does not need to back-propagate gradients through the classifier or the generator. The experiments presented in Table 2 also show the benefit of introducing momentum in the evolutionary strategy. In terms of model calls, the speed up brought by momentum is .

Same training data.
Since the GAN does not seem to produce realistic examples when its training data is not the same as that of the classifier, we present results in the context of using the same training data for both the generator and the classifier. We added this scenario to show that our method can produce exemplars with both GANs and VAEs. In Figure 2, we present a subset of representative exemplars for two classes: *surprised* and *happy*. When using a GAN as generator (first two rows in Figure 2), the exemplars present high quality visual features, irrespective of the synthetization algorithm. While the GAN exemplars are realistic, gradient descent does not always converge to a representative exemplar (the third *happy* exemplar on first row in Figure 2 seems *neutral*). Moreover, gradient descent does not seem to always produce realistic exemplars for the VAE (see first and third exemplars for the *surprised* class on third row in Figure 2).
Disjoint training data.
We conducted experiments showing that our exemplar generation framework works well when the training data used for the generator is different from the training data used for the black-box classifier. The exemplars generated by our evolutionary strategy (sixth row in Figure 2) are still realistic and representative for the *surprised* and *happy* classes. The exemplars generated by gradient descent are not always realistic, and hard to interpret by humans (see third exemplar for the *surprised* class on fifth row in Figure 2).
Disjoint classes.
We also conducted experiments to show that our exemplar synthetization framework generalizes to previously unseen classes. The images presented on the eighth row in Figure 2 are generated with a VAE which was not trained on the respective classes, *surprised* and *happy*. Still, the generated images seem realistic and representative for these two classes. Some exemplars produced by the gradient descent (seventh row in Figure 2) are less realistic.
Summary.
Considering the overall results, we notice that our evolutionary strategy does not get stuck in non-optimal solutions, while converging faster than gradient descent. Non-optimal solutions are avoided because the evolutionary strategy employs multiple starting points and the velocity values (used instead of gradients) always stay within a reasonable range. Since our method relies on making small jumps in the latent space, while ignoring the gradients, it can easily escape saddle points. The benefits of our evolutionary framework are empirically demonstrated by the results on FER 2013. In summary, we conclude that our method is more robust than gradient descent, while treating the classifier as a black-box. Indeed, we showed that access to the gradients or the training data distribution of the classifier is not required.

### 4.5 Results on Text Data

Positive Exemplars | Negative Exemplars |
---|---|

“this is a great film and i recommend it to anyone.” | “one could have a cheap soap opera instead” |

“this is a great movie to watch and you will be a great time to watch it” | “the film is not too long for the film to be a complete waste of time.” |

“there is a lot of fun in this film and it is very well paced. ” | “the acting is not much to save the entire movie.” |

“i am a fan of the genre but this is one of the best films of all time.” | “the final scene in the movie is the worst of the year.” |

“this is a very good movie for everyone but it is not perfect.” | “the film is not a complete waste of time.” |

“he a terrific actor and he is great as the lead and the performances are absolutely perfect.” | “it is not a terrible movie but it is not a bad film.” |

“it is one of the best movies i have seen in a long time” | “the film is not that bad” |

“the film is well paced and it is not that good.” | “i mean it was not that bad” |

“it a good movie but it not worth a watch” | “it was really worthless just below par.” |

*positive*(left hand-side) and

*negative*(right hand-side).

In Table 3, we provide some selected exemplars generated for a simple LSTM text classifier, revealing the preferred inputs of the model for the *positive* and *negative* classes. We note that some generated reviews are realistic and representative for their class. Other reviews, especially the *negative* ones, indicate that the classification model outputs wrong class probabilities with high confidence when it encounters some specific words. For example, sentences containing words such as “good” or “great” are classified as positive reviews, even though they appear in negated form, e.g. “not that good”. The classifier does not seem to understand contrasting transitions when evaluating the sentiment of reviews. These results are consistent with the problem of sentiment polarity classification observed by Li et al. [Li-PACLIC-2009]. Hence, even though the classifier has a relatively high test accuracy (), our method reveals that a naive training regime leads to sub-optimal results in real-world scenarios.

## 5 Conclusion

In this paper, we proposed a novel evolutionary strategy that incorporates momentum for generating exemplars for black-box models. Our framework requires an underlying generator, but it does not back-propagate gradients through the black-box model or the generator. We conducted experiments, showing that our approach can produce exemplars for three data types: image, text and tabular. Furthermore, our experiments indicate that our idea of incorporating momentum into a standard evolutionary strategy is useful, reducing the number of model calls by . The empirical results demonstrate that our optimization algorithm converges faster than gradient descent with momentum, while providing similar or even more realistic exemplars. Given that our method does not require access to the weights or the training data of the black-box model, we believe it has a boarder applicability than gradient descent methods such as [Nguyen-NIPS-2016].

Comments

There are no comments yet.