Natural Language Adversarial Attacks and Defenses in Word Level

09/15/2019 ∙ by Xiaosen Wang, et al. ∙ Huazhong University of Science u0026 Technology 0

Up until recent two years, inspired by the big amount of research about adversarial example in the field of computer vision, there has been a growing interest in adversarial attacks for Natural Language Processing (NLP). What followed was a very few works of adversarial defense for NLP. However, there exists no defense method against the successful synonyms substitution based attacks that aim to satisfy all the lexical, grammatical, semantic constraints and thus are hard to perceived by humans. To fill this gap, we postulate the generalization of the model leads to the existence of adversarial examples, and propose an adversarial defense method called Synonyms Encoding Method (SEM), which inserts an encoder before the input layer of the model and then trains the model to eliminate adversarial perturbations. Extensive experiments demonstrate that SEM can efficiently defend current best synonym substitution based adversarial attacks with almost no decay on the accuracy for benign examples. Besides, to better evaluate SEM, we also propose a strong attack method called Improved Genetic Algorithm (IGA) that adopts the genetic metaheuristic against synonyms substitution based attacks. Compared with existing genetic based adversarial attack, the proposed IGA can achieve higher attack success rate at the same time maintain the transferability of adversarial examples.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In recent years, Deep Neural Networks (DNNs) have made great success in various learning tasks in the area of Computer Vision

(Krizhevsky et al., 2012; He et al., 2016), Natural Language Processing (NLP) (Kim, 2014; Lai et al., 2015; Devlin et al., 2018), etc. However, recent studies have discovered that DNNs are vulnerable to adversarial examples not only for computer vision tasks (Szegedy et al., 2014) but also for NLP tasks (Papernot et al., 2016), which is a big threat to the safe application of AI. For example, spammers can evade spam filtering system with adversarial examples of spam emails while preserving the intended meaning.

In contrast to the numerous methods proposed for adversarial attacks (Goodfellow et al., 2015; Nicholas & David, 2017; Wang et al., 2019) and defenses (Goodfellow et al., 2015; Guo et al., 2018; Song et al., 2019) in computer vision, there is only a few list of works in the area of NLP, inspired by the works for images and emerging very recently in the past two years (Zhang et al., 2019). This is mainly because existing perturbation methods for images cannot be directly applied to texts due to their discrete property in nature. Furthermore, if we want the perturbation to be barely perceptible by humans, it should satisfy the lexical, grammatical, semantic constraints in texts, making it even harder to generate the adversarial examples.

Current attacks in NLP can fall into four categories, namely modifying the characters of a word (Liang et al., 2017; Ebrahimi et al., 2017), adding or removing words (Liang et al., 2017), replacing words arbitrarily (Papernot et al., 2016), and substituting words with synonyms (Alzantot et al., 2018; Ren et al., 2019). However, the first three categories are easy to be detected and defended by spell or syntax check (Rodriguez & Rojas-Galeano, 2018; Pruthi et al., 2019). As synonym substitution aims to satisfy all the lexical, grammatical and semantic constraints, it is hard to be detected by automatic spell or syntax check as well as human investigation. To our knowledge, currently there is no defense method against the synonym substitution based attacks.

In this work, we postulate that the model generalization leads to the existence of adversarial examples: a generalization that is not strong enough causes the problem that there usually exists some neighbor of a benign example in the manifold that has a different classification. Based on this assumption, we propose a novel defense mechanism called Synonym Encoding Method (SEM) that encodes all the synonyms to a unique code so as to force all the neighbors of to have the same label of . Specifically, we first cluster the synonyms according to the Euclidean Distance in the embedding space to construct the encoder. Then we insert the encoder before the input layer of the deep model without modifying its architecture, and train the model again to defend adversarial attacks. In this way, we can defend the synonym substitution based adversarial attacks efficiently in the context of text classification.

Extensive experiments on three popular datasets demonstrate that the proposed SEM can effectively defend the adversarial attacks, improving the classification accuracy from to , while maintaining the efficiency and achieving roughly the same accuracy on benign data as the original model does. To our knowledge, SEM is the first proposed method that can effectively defend the synonym substitution based adversarial attacks.

Besides, to demonstrate the efficacy of SEM, we also propose a genetic based attack method, denoted as Improved Genetic Algorithm (IGA), which is well-designed and more efficient compared with the first proposed genetic based attack algorithm  (Alzantot et al., 2018). Experiments show that IGA can degrade the classification accuracy more significantly with lower word substitution rate than the GA of Alzantot et al. (2018), at the same time it keeps the transferability of adversarial examples.

2 Background and Related Work

Let denote the word set containing all the legal words. Let denote an input text, be the corpus that contains all the possible input texts, and

be the output space. The classifier

takes an input and predicts its label , and let

denote the confidence value at the softmax layer for the

-th category. Let represent the set of first synonyms of within distance , namely

where is the

-norm distance evaluated on the corresponding embedding vectors.

2.1 Natural Language Adversarial Examples

Suppose we have an ideal classifier that could always output the correct label for any input text . For a subset of (train or test) texts and a small constant , we could define the natural language adversarial examples as follows:

(1)

where

is a distance metric to estimate the dissimilarity between the benign example

and adversarial example . It is usually defined as the -norm distance: .

2.2 Text Adversarial Attacks

In this subsection, we provide a brief overview on three popular synonym substitution based adversarial attack methods.

Greedy Search Algorithm (GSA).   Kuleshov et al. (2018) propose a greedy search algorithm to substitute words with their synonyms so as to maintain the semantic and syntactic similarity. GSA first constructs a synonym set for an input text :

(2)

Initially, let . Then at each stage for , GSA finds a word that satisfies the syntactic constraints and minimizes where , and updates . Such process iterates till becomes an adversarial example or the word replacement rate reaches a threshold.

Genetic Algorithm (GA).   Alzantot et al. (2018) propose a population-based algorithm to replace words with their synonyms so as to generate semantically and syntactically similar adversarial examples. There are three operators in GA:

1) : Randomly choose a word in text that has not been updated and substitute with one of its synonyms that does not violate the syntax constraint by the “Google 1 billion words language model” (Chelba et al., 2013) and minimize where and ;
2) : Randomly sample a text from population

with the probability proportional to

where ;
3) : Construct a new text , where is randomly chosen from based on the input texts and .

For a text , GA first generates the initial population of size .

(3)

Then at each iteration, GA generates the next generation of population through crossover and mutation operators.

(4)

GA terminates when it finds an adversarial example or reaches the maximum number of iteration limit.

Probability Weighted Word Saliency (PWWS).   Ren et al. (2019) proposes a new synonym substitution method called Probability Weighted Word Saliency (PWWS), which considers the word saliency as well as the classification probability. Given a text , PWWS first calculates the saliency of each word :

(5)

where “unk” means the word is removed. Then PWWS calculates the maximum possible change in the classification probability resulted from substituting word with one of its synonyms:

(6)

Then, PWWS sequentially checks the words in descending order of , where , and substitutes the current word with its optimal synonym :

(7)

PWWS terminates when it finds an adversarial example or it has replaced all the words in .

2.3 Text Adversarial Defenses

There exist very few works for text adversarial defense. Pruthi et al. (2019) propose placing a word recognition model in front of the downstream classifier to defend character-level adversarial attacks by combating adversarial spelling mistakes. For defenses on synonym substitution based attacks, only Alzantot et al. (2018) and Ren et al. (2019) incorporate the adversarial training strategy proposed in the image domain (Goodfellow et al., 2015) with their text attack methods and demonstrate that adversarial training can promote the model’s robustness. However, there is no defense method specifically designed to defend the synonym substitution based adversarial attacks.

3 The Proposed Text Defense Method

In this section, we first introduce our motivation, and then present the proposed text defense method, Synonym Encoding Method (SEM).

3.1 Motivation

Figure 1: The neighborhood of a data point in the input space. (a) Traditional training: there exists some data point that the model never see before and yields wrong classification, in other words, such data point will be an adversarial example. (b) Adding infinite labeled data: this is an ideal case that the model has seen all the data points to resist adversarial examples. (c) Using unlabeled data in the neighborhood: using unlabeled data which share the label of its neighbor so as to improve the adversarial robustness. (d) Mapping neighborhood data points: mapping all neighbors to the center so as to eliminate adversarial examples.

Let denote the input space, denote the -neighborhood of data point , where . As Figure 1 (a) shows, we postulate that the generalization of the model leads to the existence of adversarial examples. More generally, given a data point , where is an adversarial example of .

Ideally, to defend the adversarial attack, we need to train a classifier which not only guarantees , but also assures . Thus, the most effective way is to add more labeled data to improve the adversarial robustness (Schmidt et al., 2018). Ideally, as illustrated in Figure 1 (b), if we have infinite labeled data, we can train a model with high probability so that the model is robust enough to adversarial examples. Practically, however, labeling data is very expensive and it is impossible to have infinite labeled data.

As it is expensive to label the data, recent works (Zhai et al., 2019; Carmon et al., 2019; Uesato et al., 2019)

propose to use semi-supervised learning, which adopts unlabeled data to improve the adversarial robustness for image classification. As illustrated in Figure

1 (c), it employs the unlabeled data to sample the neighbor which shares the predicted label of an unlabeled . The goal is to train a model such that the classification is more smooth. Although semi-supervised learning does not need the labels of the data in the neighborhood, it still needs to sample as many data as possible in the neighborhood of which needs many unlabeled data.

In this work, as illustrated in Figure 1 (d), we propose a novel way to find a mapping where . In this way, we force the classification to be more smooth and we do not need any extra data to train the model or modify the architecture of the model. All we need to do is to insert the mapping before the input layer and train the model on the original training set. Now the problem turns into how to locate the neighbors of data point . For image tasks, it is hard to find all images in the neighborhood of in the input space, and there could be infinite number of neighbors. For NLP tasks, however, utilizing the property that words in sentence are discrete tokens, we can easily find almost all neighbors of an input text. Thus, we propose a new method called Synonym Encoding Method to locate the neighbors of an input .

3.2 Synonym Encoding

We assume that the closer the meanings of two sentences, the closer the distance between them in the input space. Therefore, we can suppose that the neighbors of are its synonymous sentences. To find the synonymous sentence, we can substitute words in the sentence with their synonyms. To construct the mapping , all we need to do is to cluster the synonyms and allocate a unique token for each cluster, which we call the Synonym Encoding Method (SEM). The algorithm for SEM is described in Algorithm 1.

0:  : dictionary of words, : size of , : distance for synonyms, : number of synonyms for each word
0:  : encoding result
1:  
2:  for each word  do
3:     if  NONE then
4:        if , NONE then
5:            the closest where NONE
6:           
7:        else
8:           
9:        end if
10:        for each word in  do
11:           if  NONE then
12:              
13:           end if
14:        end for
15:     end if
16:  end for
17:  return
Algorithm 1 Synonym Encoding Method

4 The Improved Genetic based Text Attack

The current synonym substitution based text adversarial attacks (Alzantot et al., 2018; Kuleshov et al., 2018; Ren et al., 2019) have a constraint that they only substitute words at the same position once or replace words with the first synonyms of the word in the original input . This constraint can lead to local minimum for adversarial examples. And it is hard to choose a suitable as different words may have different number of synonyms.

To address this issue, we propose an Improved Genetic Algorithm (IGA), which allows to substitute the word in the same position more than once based on the current text . In this way, IGA can traverse all synonyms of a word no matter what value is. Meanwhile, we can avoid local minimum to some extent as we allow the substitution of the word by the original word in the current position. In order to guarantee that the substituted word is still a synonym of the original word, each word in the same position can be replaced at most times.

Differs to the first genetic based text attack algorithm of Alzantot et al. (2018), we change the structure of the algorithm, such as changing the way of crossover and mutation. See details in Appendix 7.1.

5 Experiments

We evaluate SEM with four attacks, GSA (Kuleshov et al., 2018), GA (Alzantot et al., 2018), PWWS (Ren et al., 2019) and our IGA, on three popular datasets involving three neural network classification models. The results demonstrate that SEM can significantly improve the robustness of neural networks and IGA can achieve better attack result as compared with other attacks.

5.1 Experimental Setup

We first provide an overview of the datasets and classification models used in the experiments.

Datasets.  In order to evaluate the efficacy of SEM, we choose three popular datasets: IMDB, AG’s News, and Yahoo! Answers. IMDB (Potts, 2011) is a large dataset for binary sentiment classification, containing highly polarized movie reviews for training and for testing. AG’s News (Zhang et al., 2015) consists news article pertaining four classes: World, Sports, Business and Sci/Tech. Each class contains training examples and testing examples. Yahoo! Answers (Zhang et al., 2015) is a topic classification dataset from the “Yahoo! Answers Comprehensive Questions and Answers” version 1.0 dataset with 10 categories, such as Society & Culture, Science & Mathematics, etc. Each class contains 140,000 training samples and 5,000 testing samples.

Models.

  To better evaluate our method, we adopt several state-of-the-art models for text classification, including Convolution Neural Network (CNNs) and Recurrent Neural Networks (RNNs). The dimension of embedding for all models are 300

(Mikolov et al., 2013). We replicate the CNN’s architecture from Kim (2014), which contains three convolutional layers with filter size of

respectively, a max-pooling layer and a fully-connected layer. LSTM consists of three LSTM layers where each layer has

LSTM units and a fully-connected layer (Liu et al., 2016). Bi-LSTM contains a bi-directional LSTM layer whose forward and reverse are composed of LSTM units respectively and a fully-connected layer.

Baselines.  We take the method of adversarial training (Goodfellow et al., 2015) as our baseline. However, due to the low efficiency of text adversarial attacks, we cannot implement adversarial training as it is in the image domain. In the experiments, we adopt PWWS, which is quicker than GA and IGA, to generate adversarial examples of the training set, and re-train the model including the adversarial examples with the training data.

5.2 Evaluation on Defense Methods

To evaluate the efficiency of our SEM method, we randomly sample correctly classified examples on different models from each dataset and use the above attack methods to generate adversarial examples with or without defense. The more effective the defense method is, the smaller the classification accuracy of the model drops. Table 1 shows the efficacy of various attack and defense methods.

Word-CNN() LSTM() Bi-LSTM()
NT AT SEM NT AT SEM NT AT SEM
IMDB No Attack 88.7 89.1 86.8 87.3 89.6 86.8 88.2 90.3 87.6
GSA 13.3 16.9 66.4   8.3 21.1 72.2   7.9 20.8 73.1
PWWS   4.4   5.3 71.1   2.2   3.6 77.3   1.8   3.2 76.1
GA   7.1 10.7 71.8   2.6   9.0 77.0   1.8   7.2 71.6
IGA   0.9   2.7 65.1   0.9   1.8 71.2   0.9   2.7 69.3
AG’s News No Attack 91.7 92.2 88.7 91.8 92.1 90.9 91.7 92.2 88.7
GSA 33.0 37.8 63.9 45.9 56.7 83.2 33.0 37.8 63.9
PWWS 30.7 41.5 67.6 50.0 55.7 85.0 30.7 41.5 67.6
GA 24.1 40.6 77.9 43.8 57.3 86.4 24.1 40.6 77.9
IGA 21.5 35.5 70.3 40.0 55.3 81.8 21.5 35.5 70.3
Yahoo! Answers No Attack 68.4 69.3 65.8 71.6 71.7 69.0 72.3 72.8 70.2
GSA 19.6 20.8 49.4 27.6 30.5 48.6 24.6 30.9 53.4
PWWS 10.3 12.5 52.6 21.1 22.9 54.9 17.3 20.0 57.2
GA 13.7 16.6 59.2 15.8 17.9 66.2 13.0 16.0 63.2
IGA   8.9 10.0 51.4 10.5 15.1 53.3 12.4 13.5 55.7
Table 1: Accuracy of various classification models on the datasets, with and without defenses, under adversarial attacks. For each model, if we look at each row, the highest classification accuracy for various defense methods is highlighted in bold to indicate the best defense efficacy; if we look at each column, the lowest classification accuracy under various adversarial attacks is highlighted in underline to indicate the best attack efficacy. NT: Normal Training, AT: Adversarial Training.

We look at each row to find the best defense result for each network model.

With no attack, adversarial training (AT) could improve the classification accuracy of the models on all datasets, as AT is also the way to augment the training set. Our defense method SEM reaches an accuracy closing to the normal training (NT).

Under the four attacks (GSA, PWWs, GA and IGA), however, the classification accuracy with normal training (NT) and adversarial training (NT) drops significantly. Under normal training (NT), the accuracy degrades more than , and on the three datasets respectively. And adversarial training (AT) cannot defend these attacks effectively either, especially for PWWS and IGA on IMDB and Yahoo! Answers, where adversarial training only improves the accuracy a little (smaller than ). By contrast, SEM can remarkably improve the robustness of the deep models under the four attacks.

5.3 Defense for Transferability

In the image domain, the transferability of adversarial attack refers to its ability to decrease the accuracy of models using adversarial examples generated based on other models (Szegedy et al., 2014; Goodfellow et al., 2015). Papernot et al. (2016) find that the adversarial examples in NLP also exhibite a good transferability. Therefore, a good defense method could not only defend the adversarial attack but also avoid the transferability of adversarial examples.

To evaluate the ability of preventing the transferability of adversarial examples, we generate adversarial examples on each model under normal training and test them on other models with or without defense. The results are shown in Table 2. Almost on all models with adversarial examples generated by other models, SEM could yield the highest classification accuracy.

Word-CNN() LSTM() Bi-LSTM()
NT AT SEM NT AT SEM NT AT SEM
GSA 19.6* 52.7 57.5 52.6 58.4 61.8 52.7 57.5 59.6
PWWS 10.3* 54.4 57.2 46.5 57.7 61.4 53.1 57.9 56.8
GA 13.7* 49.9 55.2 43.0 66.0 65.6 56.4 65.5 67.4
IGA   8.9* 53.7 56.5 46.5 56.6 58.6 53.1 56.8 57.2
GSA 47.2 52.7 55.9 27.6* 60.9 62.1 53.8 61.9 62.8
PWWS 43.7 54.7 56.5 21.1* 59.9 61.8 50.6 59.2 61.4
GA 41.0 48.5 54.8 15.8* 57.4 57.5 43.4 58.2 60.9
IGA 47.8 53.0 55.2 10.5* 53.4 58.3 53.5 59.0 61.1
GSA 43.7 53.4 54.3 52.9 57.7 58.7 24.6* 60.4 61.1
PWWS 41.7 48.5 49.7 41.2 58.0 60.4 17.3* 60.1 58.6
GA 36.9 45.7 49.0 43.0 58.8 60.7 13.0* 58.2 60.0
IGA 44.8 50.6 52.3 45.1 57.4 58.7 12.4* 60.1 61.1
Table 2: Accuracy of various classification models for adversarial examples generated on other models on Yahoo! Answers for evaluating the transferability. * indicates that the adversarial examples are generated based on this model.

5.4 Evaluation on the attack methods (IGA vs. GA)

For text attacks, we compare the proposed IGA with GA from various aspects, including attack efficacy, transferability and human evaluation.

Attack Efficacy. As shown in Table 1, looking at each column, we see that under normal training (NT) and adversarial training (AT), IGA can always achieve the lowest classification accuracy on all models and datasets among the four attacks. Under the third column of SEM defense, IGA always outperforms GA, although IGA may out be the best among all attacks.

Besides, as depicted in Table 3, IGA can yield lower word substitution rate than GA on most models. Note that for SEM, GA can yield lower word substitution rate, because GA may not replace the word as most words cannot bring any benefit for the first replacement.

Word-CNN() LSTM() Bi-LSTM()
NT AT SEM NT AT SEM NT AT SEM
IMDB GA   9.3   9.3   4.3 10.0 10.6   4.0   6.3   8.0   3.1
IGA   6.4   7.8   6.3   5.3   6.9   5.7   5.6   6.6   7.5
AG’s News GA 10.3 17.7   6.6 18.3 21.2   9.8 12.0 21.4   8.5
IGA 12.1 11.4   6.3 14.4 12.0   9.6 13.5 11.6   4.6
Yahoo! Answers GA 12.4   9.5   4.7 12.5 15.8   8.1 13.9 15.3   4.7
IGA   6.4   7.1   3.7   7.9   8.2   4.5   3.6   8.2   5.1
Table 3: The word substitution rate for GA and IGA on different models.

Transferability. As shown in Table 2, the adversarial examples generated by IGA maintain the overall same transferability as the ones generated by GA. For instance, if we generate adversarial examples on Word-CNN (see column 2), GA can achieve better transferability on LSTM with normal training (see column 5) while IGA can achieve better tramsferability on LSTM with adversarial training and SEM (see column 6 and 7).

Figure 2: Classification accuracy by human evaluation.

Human Evaluation. To further verify that the perturbations in the adversarial examples by IGA are hard for humans to perceive, We also perform a human evaluation on IMDB with 35 volunteers. We first randomly choose benign examples that can be classified correctly and generate adversarial examples by GA and IGA on the three models so that we have a total of examples. Then we randomly split them into groups where each group contains examples. We ask every five volunteers to classify one group independently. The accuracy of human evaluation on benign examples is .

As shown in Figure 2, the classification accuracy of human on adversarial examples generated by IGA is slightly higher than those generated by GA, and slightly closer to the accuracy of human on benign examples.

In conclusion, IGA can achieve the highest attack success rate when compared with the previous synonyms substitution based adversarial attacks and get lower word replacement rate than GA. Besides, the adversarial examples generated by IGA maintains the same transferability as GA does and are a little harder for humans to distinguish than GA. Some adversarial examples generated by IGA are shown in Appendix 7.2.

5.5 Further Discussion on SEM’s hyper-parameter

To explore how in SEM influences the efficacy, we try different ranging from to for three models on IMDB with or without adversarial attack. The results are illustrated in Figure 3.

On benign data, as shown in Figure 3(a), the classification accuracy of the models decreases a little when increases. Because a bigger indicates that we need less words to train the model, which could degrade the efficacy of the models. Nevertheless, the classification accuracy does not decrease much as SEM could maintain the semantic invariance of the original text after encoding.

Then we show the defense efficacy of SEM on the three models when change the value of , as shown in Figure 3(b) to Figure 3(d). When , SEM could not take any impact, we see that the accuracy is the lowest under all attacks. When increases, SEM starts to defend the attacks, the accuracy increases rapidly and reach the peak when . Then the accuracy decays slowly if we continue to increase . Thus, we choose to balance the accuracy on benign examples and on adversarial examples.

(a) Models under no attack
(b) Word-CNN under attacks
(c) LSTM under attacks
(d) Bi-LSTM under attacks
Figure 3: The classification accuracy for various ranging from to for three models on IMDB with attacks or with no attacks.

6 Conclusion

Synonym substitution based adversarial attacks are currently the best text attack methods, and as they are hard to be checked by automatic spell or syntax check as well as human investigation. In this work, we propose a novel defense method called Synonyms Encoding Method (SEM), which encodes the synonyms of each word to defend adversarial attacks for text classification task. Extensive experiments show that SEM can defend adversarial attacks efficiently and degrade the transferability of adversarial examples, at the same time SEM maintains the classification accuracy on benign data. To our knowledge, this is the first and efficient text defense method in word level for the state-of-the-art synonym substitution based attacks.

In addition, we propose a new text attack method called Improved Genetic Attack (IGA), which in most cases can achieve much higher attack success rate when compared with existing attacks, at the same time it could maintain the transferability of adversarial examples.

References

7 Appendix

7.1 Details of IGA

In this subsection, we introduce our Improved Genetic Algorithm (IGA) in details and show how IGA differs from the first proposed generic attack method, GA Alzantot et al. (2018). Regard a text as a chromosome, there are two operators in IGA:

1) Crossover(a,b): For two texts and where and , randomly choose a crossover point from to , and generate a new text .

2) : For a text and a position , replace with where to get a new text that minimizes .

The details of IGA is described in Algorithm 2.

0:  : input text, : true label for x, : maximum number of iterations
0:  : output adversarial example
1:  for each word  do
2:     
3:  end for
4:  for  do
5:     
6:     if  then
7:        return
8:     end if
9:     
10:     for  do
11:        Randomly sample , from
12:        
13:        Randomly choose a word from
14:        
15:     end for
16:  end for
17:  return
Algorithm 2 The Improved Genetic Algorithm

Compared with GA, IGA has the following differences:

1) : GA initializes the first population randomly, while IGA initializes the first population by replacing each word randomly by a synonym, so our population is more diversified.

2) : Different from GA, IGA allows to replace the word that has been replaced before so that we can avoid local minimum, as described in Section 4.

3) : To better simulate the reproduction and biological crossover, we randomly cut the text from two parents and concat two fragments into a new text rather than randomly choose word of each position from the two parents.

The selection of the next generation is similar to GA, by greedily choose one offspring, and then generate other offsprings by on two randomly chosen parents. But as and are different, IGA has very different offsprings.

7.2 Adversarial Examples Generated by GA and IGA

To show the generated adversarial examples, we randomly pick some benign examples from IMDB and generate adversarial examples by GA and IGA respectively on model LSTM. The examples are shown in Table 4, and we see that IGA substitutes less words than GA on LSTM with normal training. This is also consistent with the statistics in Table 3.

Confidence() Prediction Text
99.9 1 I enjoyed this film which I thought was well written and acted , there was plenty of humour and a provoking storyline, a warm and enjoyable experience with an emotional ending.
Original 97.2 0 I am sorry but this is the worst film I have ever seen in my life. I cannot believe that after making the first one in the series, they were able to get a budget to make another. This is the least scary film I have ever watched and laughed all the way through to the end.
99.7 1 This is a unique masterpiece made by the best director ever lived in the ussr. He knows the art of film making and can use it very well. If you find this movie, buy or copy it!
88.2 0 I enjoyed this film which I thought was well written and proceeded, there was plenty of humorous and a igniting storyline, a tepid and enjoyable experience with an emotional terminate.
GA 99.9 1 I am sorry but this is the hardest film I have ever seen in my life. I cannot believe that after making the first one in the series they were able to get a budget to make another. This is the least terrifying film I have ever watched and laughed all the way through to the end.
68.9 0 This is a unique masterpiece made by the best superintendent ever lived in the ussr. He knows the art of film making and can use it supremely alright. If you find this movie, buy or copy it!
72.1 0 I enjoyed this film which I thought was well written and acted, there was plenty of humour and a provoking storyline, a lukewarm and agreeable experience with an emotional ending.
IGA 99.8 1 I am sorry but this is the hardest film I have ever seen in my life. I cannot believe that after making the first one in the series, they were able to get a budget to make another. This is the least scary film I have ever watched and laughed all the way through to the end.
86.2 0 This is a sole masterpiece made by the best director ever lived in the ussr. He knows the art of film making and can use it very well. If you find this movie, buy or copy it!
Table 4: The adversarial examples generated by GA and IGA on IMDB using model LSTM.