Today machine learning classifiers have been widely used to provide key services such as information filtering, sentiment analysis. However, recently researchers have found that these ML classifiers, even deep learning classifiers are vulnerable to adversarial attacks. They demonstrate that image classifier and now even text classifier  can be fooled easily by adversarial examples that are deliberately crafted by attacking algorithms. Their algorithms generate adversarial examples in a pair-wise way. That is, given one input , they aim to generate one corresponding adversarial example by adding small imperceptible perturbations to . The adversarial examples must maintain the semantics of the original inputs, that is, must be still classified as the same class as by humans. On the other hand, adversarial training is shown to be a useful defense method to resist adversarial examples [31, 10]. Trained on a mixture of adversarial and clean examples, classifiers can be resistant to adversarial examples.
In the area of natural language processing (NLP), existing methods are pair-wise, thus heavily depend on input data. If attackers want to generate adversarial texts which should be classified as a chosen class with pair-wise methods, they must first collect texts labeled as the chosen class, then transform these labeled texts to the corresponding adversarial examples by replacing a few words. As the amount of labeled data is always small, the number of generated adversarial examples is limited. These adversarial examples are often not diverse enough and sometimes hard to read, thus can be easily detected by humans. Moreover, in practice, if attackers aim to attack a public opinion monitoring system, they must collect a large number of high-quality labeled samples to generate a vast amount of adversarial examples, otherwise, they can hardly create an impact on the targeted system. Therefore, pair-wise methods only demonstrate the feasibility of the attack but cannot create chaos on a large scale.
In this paper, we propose an unrestricted end to end solution to efficiently generate adversarial texts, where adversarial examples can be generated from scratch without real-world texts and are still meaningful for humans. We argue that adversarial examples do not need to be generated by perturbing existing inputs. For example, we can generate a movie review that does not stem from any examples in the dataset at hand. If the movie review is thought to be a positive review by humans but classified as a negative review by the targeted model, the movie review is also an adversarial example. Adversarial examples generated in this way can break the limit of input number, thus we can get large scale adversarial examples. On the other hand, the proposed method can also be used to create more adversarial examples for defense. Trained with more adversarial examples often means more robustness for these key services.
The proposed method leverages a conditional variational autoencoder (VAE) to be the generator which can generate texts of a desired class. To guide the generator to generate texts that mislead the targeted model, we access the targeted model in a white-box setting and use an adversarial loss to make the targeted model make a wrong prediction. In order to make the generated texts consistent with human cognition, we use discrimators and the training framework of generative adversarial networks (GANs) to make generated texts similar as real data of the desired class. After the whole model is trained, we can sample from the latent space of VAE and generate infinite adversarial examples without accessing the targeted model. The model can also transforms a given input to an adversarial one.
We evaluate the performance of our attack method on a sentiment analysis task. Experiments show the scalability of generation. The adversarial examples generated from scratch achieve a high attack success rate and have acceptable quality. As the model can generate texts only with feed-forwards in parallel, the generation speed is quite fast compared with other methods. Additional ablation studies verify the effectiveness of discrimators, and data augmentation experiments demonstrate that our method can generate large-scale adversarial examples with higher quality than other methods. When existing data at hand is limited, our method is superior over the pair-wise generation.
In summary, the major contributions of this paper are as follows:
Unlike the existing literature in text attacks, we aim to construct adversarial examples not by transforming given texts. Instead, we train a model to generate text adversarial examples from scratch. In this way, adversarial examples are not restricted to existing inputs at hand but can be generated from scratch on a large-scale.
We propose a novel method based on the vanilla conditional VAE. To generate adversarial examples, we incorporate an adversarial loss to guide the vanilla VAE’s generation process.
We adopt one discrimator for each class of data. When training, we train the discrimators and the conditional VAE in a min-max game like GANs, which can make generated texts more consistent with real data of the desired class.
We conduct attack experiments on a sentiment analysis task. Experimental results show that our method is scalable and achieves a higher attack success rate at a higher speed than recent baselines. The quality of generated texts is also acceptable. Further ablation studies and data augmentation experiments verify our intuitions and demonstrate the superiority of scalable text adversarial example generation.
2 Related Work
There has been extensive studies on adversarial machine leaning, especially on deep neural models [31, 10, 16, 28, 1]. Much work focuses on image classification tasks [31, 10, 5, 11, 33].  solves the attack problem as an optimization problem with a box-constrained L-BFGS.  proposes the fast gradient sign method (FGSM), which perturbs images with noise computed as the gradients of the inputs.
In NLP, perturbing texts is more difficult than images, because words in sentences are discrete, on which we can not directly perform gradient-based attacks like continuous image space. Most methods adapt the pair-wise methods of image attacks to text attacks. They perturb texts by replacing a few words in texts. [24, 9, 6]
calculate gradients with respect to the word vectors and perturb word embedding vectors with gradients. They find the word vector nearest to the perturbed vector. In this way, the perturbed vector can be map to a discrete word to replace the original one. These methods are gradient-based replacement methods.
Other attacks on texts can be summarized as gradient-free replacement methods. They replace words in texts with typos or synonyms.  proposes to edit words with tricks like insertion, deletion and replacement. They choose appropriate words to replace by calculating the word frequency and the highest gradient magnitude.  proposes five automatic word replacement methods, and use magnitude of gradients of the word embedding vectors to choose the most important words to replace. 
is based on synonyms substitution strategy. Authors introduce a new word replacement order determined by both the word saliency and the classification probability. However, these replacement methods still generate adversarial texts in a pair-wise way, which restrict the adversarial texts to the variants of given real-world texts. Besides, the substitute words sometimes change text meanings. Thus existing adversarial text generation methods only demonstrate the feasibility of the attack but cannot create chaos on a large scale.
In order to tackle the above problems, we propose an unrestricted end to end solution to generate diverse adversarial texts on a large scale with no need of given texts.
In this section, we propose a novel method to generate adversarial texts for the text classification model on a large scale. Though trained with labeled data in a pair-wise way, after it is trained, our model can generate an unlimited number of adversarial examples without any input data. Moreover, like other traditional pair-wise generation methods, our model can also transform a given text into an adversarial one. Unlike the existing methods, our model generates adversarial texts without querying the attacked model, thus the generation procedure is quite fast.
Figure 2 illustrates the overall architecture of our model. The model has three components: a generator , discrimators , and a targeted model . and form a generative adversarial network (GAN). When training, we feed an original input to the generator , which transforms to an adversarial output . The procedure can be defined as follows:
aims to generate to reconstruct . Then, we feed the generated to the targeted model , and will classify as a certain class, which we hope is a wrong label. Thus we have the following equation:
where and is the label space of the targeted classification model.
In order to keep being classified as the same class as by human, we add one discrimator for each class . With the help of the min-max training strategy of GAN framework, each class ’s discrimator can make close to the distribution of real class data, thus is made to be compatible with human congnition.
We now proceed by introducing these components in further details.
In this subsection, we describe the generator for text generation. We use the variational autoencoder (VAE) [14, 27] as the generator. The VAE is a generative model based on a regularized version of the standard autoencoder. This model supposes the latent variable is sampled from a prior distribution.
As shown in Figure 2, the VAE is composed of the encoder and the decoder , where is the parameters of and is the parameters of .
is a neural network. Its input is a text, its output is a latent code . encodes into a latent representation space , which is a lower-dimensional space than the input space. is another neural network. Its input is the code , it outputs an adversarial text
to the probability distribution of the input data.
In our model, we adopt the gated recurrent unit (GRU) as the encoder and the decoder. As in Figure 3, The input is a sentence of words, we formulate the input for neural networks as follows: for a word at the position in a sentence, we first transform it into a word vector by looking up a word embedding table. The word embedding table is randomly initialized and is updated during the model training. Then the word embedding vectors are fed into the GRU encoder. In the -th GRU cell, a hidden state is emitted.
We use to denote the last GRU cell’s hidden state, where N is the length of the encoder input. In order to get latent code , we feed into two linear layers to get and respectively. Following the Gaussian reparameterization trick , we sample a random sample from a standard Gaussian (, ), and compute as:
Computed in this way,
is guaranteed to be sampled from a Gaussian distribution.
Then, we can decode to generate an adversarial text . Before feeding to the decoder, we adopt a condition embedding to guide the decoder to generate text of a certain class , which can be chosen arbitrarily. Suppose in a text classification task, there are classes. Specifically, we randomly initialize a class embedding table as a matrix and look up to get the corresponding embedding of class . Then, we feed into a linear layer to get another vector representation. The vector encodes the information of the input text and a desired class.
The decoder GRU uses this vector as the initial state to generate the output text. Each GRU cell generates one word. The computation process is similar to that of the GRU encoder, except the output layer of each cell. The output of the -th GRU cell is computed as:
where is the transformation weights, is the word vocabulary, and . is the probability of the -th GRU cell emitting the -th word in the vocabulary.
In the training phase, the GRU cell chooses the word index with the highest probability to emit:
When training, the loss function of the VAE is calculated as:
The first term is the reconstruction loss, or expected negative log-likelihood. This term encourages the decoder to learn to reconstruct the data. So the output text is made to be similar to the input text. The second term is the Kullback-Leibler divergence between the latent vector distributionand
. If the VAE were trained with only the reconstruction objective, it would learn to encode its inputs deterministically by making the variances invanishingly small . Instead, the VAE uses the second term to encourages the model to keep its posterior distributions close to a prior , which is generally set as a standard Gaussian.
In the training phase, the input to the GRU decoder is the input text, appended with a special GO token as the start word. We add a special EOS token to the input text as the ground truth of the output text. The EOS token represents the end of the sentence. When training the GRU decoder to generate texts, the GRU decoder tends to ignore the latent code and only relies on the input to emit output text. It actually degenerates into a language model. This situation is called KL-vanishing. To tackle the KL-vanishing problem in training GRU decoder, we adopt the KL-annealing mechanism . KL-annealing mechanism gradually increase the KL weight from to . This can be thought of as annealing from a vanilla autoencoder to a VAE. Also, we randomly drop the input words into the decoder with a fixed keep rate , to make the decoder depend on the latent code to generate output text.
Notably, if we randomly sample from a standard Gaussian, the decoder can also generate output text based on . The difference is that there is no input to the GRU decoder, but we can send the word generated by the -th GRU cell to the -th GRU cell as the -th input word. Specifically, in the inference phase, we use beam-search to generate words. The initial input word to the first GRU cell is the GO token. When the decoder emits the EOS token, the decoder stops generating new words, and the generation of one complete sentence is finished.
In this way, after is trained, theoretically, we can sample infinite from the latent space and generate infinite output texts based on these . This is part of the superiority of our method.
3.3 Targeted Model
Since the TextCNN model has good performances and is quite fast, it is one of the most widely used methods for text classification task in industrial applications . As we aim to attack models used in practice, we take the TextCNN model  as our targeted model.
Suppose we set the condition of the VAE to be , the decoder generates the output text , then we feed the text into the targeted model, and the targeted model will predict a probability for each candidate class . We conduct targeted attack and aim to cheat the targeted model to classify as class (), we can get the following adversarial loss function:
This is a cross entropy loss that maximize the probability of class .
Recall that words in the adversarial text are computed in Equation 6, in which Function is not derivative. So we can not directly feed the word index computed in Equation 6 into the targeted model. In this paper, we utilize the Gumbel-Softmax  to make continuous value approximate discrete word index. The embedding matrix fed to TextCNN is calculated as:
3.4 Discrimator Model
Until this point, ideally, we suppose the generated should have many same words as of class (thus be classified as by humans) and be classified as class by the targeted model. But this assumption is not rigorous. Most of the time, is not classified as by humans. In natural language texts, even a single word change may change the whole meaning of a sentence. A valid adversarial example must be imperceptible to humans. That is, humans must classify as class .
Suppose is the distribution of real data of class and is the distribution of generated adversarial data transformed from . We utilize the idea of GAN framework to make similar to data from . Thus will be classified as by humans and classified as at the same time.
Specifically, we adopt one discrimator for each class . aims to distinguish the data distribution of real labeled data of class and adversarial data generated by with desired class :
The overall training objective is a min-max game played between the generator and the discrimators , , …, , where is the total number of classes:
tries to distinguish and , while tries to fool to make be classified as real data by . Trained in this adversarial way, the generated adversarial text distribution is drawn close to distribution , which is of class . Thus is mostly likely to be similar to data from and is classfied as by human as a result.
We implement the discrimators with multi-layer perceptions (MLPs). Because function is not derivable, similar to Equation 9 and 10 in Section 3.3, we first use Gumbel-Softmax to transform the decoder output from Equation 4 into a fixed-sized matrix . Then, calculate the probability of a text being true data of class as:
3.5 Model Training
We first train the VAE and the targeted model with training data. Then we freeze weights of the targeted model and initialize the ’s weights with the pretrained VAE’s weights. At last, the generator and all the discrimators , , …, are trained in a min-max game with loss . The whole training process is summarized in Algorithm 1.
We report the performances of our method on attacking TextCNN on sentiment analysis task, which is an important text classification task. Sentiment analysis is widely applied to helping a business understand the social sentiment of their products or services by monitoring online user reviews and comments [23, 4, 21]. In several experiments, we evaluate the quality of the text adversarial examples for sentiment analysis generated by the proposed method.
Experiments are conducted from two aspects. Specifically, we first follow the popular settings and evaluate our model’s performances of transforming an existing input text into an adversarial one. We observe that our method has higher attack success rate, generates fluent texts and is efficient. Besides, we also evaluate our method on generating adversarial texts from scratch unrestrictedly. Experimental results show that we can generate large-scale diverse examples. The generated adversarial texts are mostly valid, and can be utilized to substantially improve the robustness of text classification models.
We further report ablation studies, which verifies the effectiveness of the discrimators. Defense experiment results demonstrate that generating large-scale can help to make model more robust.
4.1 Experiment Setup and Details
Rotten Tomatoes Movie Reviews (RT) . This dataset consists of positive and negative processed movie reviews. We divide of the dataset as the training set, as the development set and as the test set.
4.2 Comparing With Pair-wise Methods
In most of the existing work [26, 18, 1], text adversarial examples are generated through a pair-wise way. That is, first we should take a text example, and then transform it into an adversarial instance.
To compare with the current methods fairly, we limit our method to pair-wise generation. In this experiment, we set . Specifically, we first feed an input text into the GRU encoder, and set the condition as the ground-truth class of the text. After that, the decoder can decode to get the adversarial output text.
We choose four representative methods as baselines:
Random: Select words randomly and modify them.
Fast Gradient Sign Method (FGSM) : First, perturbation is computed as sign(), where is the loss function and is the word vectors. Then, search in the word embedding table to find the nearest word vector to the perturbed word vector. FGSM is the fastest among gradient-based replacement methods.
DeepFool : This is also a gradient-based replacement method. It aims to find out the best direction, towards which it takes the shortest distance to cross the decision boundary. The perturbation is also applied to the word vectors. After that, nearest neighbor search is used to generate adversarial texts.
TextBugger : TextBugger is a gradient-free replacement method. It proposes strategies such as changing the word’s spelling and replacing a word with its synonym, to change a word slightly to create adversarial texts. Gradients are only computed to find the most important words to change.
We summarize the performances of of our method and all baselines in Table 1. From Table 1, we can observe that randomly changing words is not enough to fool the classifier. This implies the difficulty of attack. TextBugger and our method both achieve quite high attack success rate. While our method performs even better than TextBugger, which is the state-of-the-art method.
We show some adversarial examples generated by our method and TextBugger to demonstrate the differences in Figure 4.
We can observe that TextBugger mainly changes the spelling of words. The generated text becomes not fluent and easy to be detected by grammar checking systems. Also, though humans may guess the original meanings, the changed words are treated as out of vocabulary words by models. For example, TextBugger changes the spelling of ‘awful’, ‘cliches’ and ‘foolish’ in Figure 4. These words are important negative sentiment words for a negative sentence. It is natural that changing these words to unknown words can change the prediction of models. Unlike TextBugger, our method generates meaningful and fluent contents. For example, in the first example of Figure 4, we replace ‘read the novel’ with ‘love the book’, the substitution is still fluent and make sense to both humans and models.
Generation Speed. It takes about one hour and about 3 hours to train our model on RT dataset and IMDB dataset respectively. We also evaluate the time cost of generating one adversarial example. We take the FGSM method as the representative of gradient-based methods, as FGSM is the fastest among them. We measure the time cost of generating adversarial examples and calculate the average time of generating one. Results are shown in Table 2.
We can observe that our method is much faster than others. That is mainly because our generative model is trained beforehand. After the model is trained, the generation of one batch just requires one feed-forward.
4.3 Unrestricted Adversarial Text Generation
As mentioned in Section 3.2, after our model is trained, we can randomly sample from latent space, choose a desired class , get the embedding vector of , then feed to the decoder to generate adversarial texts unrestrictedly with no need of labeled text.
Attack Success Rate. When training, we can tune in Equation 14 to affect the model. After trained with different , we observe the generated texts are different. We randomly generate 50,000 examples and compute the proportion of adversarial examples with different . The results are shown in Figure 5(a). Notice if we set , the model is a vanilla VAE and it is not trained continually after pretrained.
From Figure 5(a), we can observe that the attack success rate of the vanilla VAE is only and respectively, this implies that only randomly generating texts can hardly fool the targeted model. When is greater than , the attack success rate is consistently better than the vanilla VAE. This reflects the importance of .
Also, the attack success rate increases as becomes larger. It is because the larger is, the more important role will plays in the final joint loss . So, the text generator is more easily guided by the to generate an adversarial example.
To evaluate the quality of the generated adversarial texts with different , we adopt three metrics : perplexity, validity and diversity.
Perplexity. Perplexity  is a measurement of how well a probability model predicts a sample. A low perplexity indicates the language model is good at predicting the sample. Given a pretrained language model, it can also be used to evaluate the quality of texts. Similarly, a low perplexity indicates the text is more fluent for the language model. We compute perplexity as:
where is the number of words in one sentence. is the probability of -th word in computed by the language model.
We train a language model with the training data of IMDB and RT, and use it as in Equation 15. We measure and compare the perplexity of the generated 50,000 texts and data of the original training set. Results are shown in Figure 5(b). We can observe that the perplexity is only a bit higher than the original data’s, which means that the quality of generated texts are acceptable. Also, as gets larger, the perplexity gets bigger. This is perhaps because can distort the generated texts.
Validity. If we feed to the decoder, then a valid generated adversarial text is supposed to be classified as class by humans but be classified as class by the targeted model. We randomly select 100 generated texts for each and manually evaluate their validity. The results are shown in Figure 5(c).
From Figures 5(c), we can observe that the validity rates of our method on both datasets are higher than and much higher than that of the vanilla VAE. This implies our methods can generate high-quality and high-validity texts with high attack success rate.
Diversity. We first generate one million adversarial texts. To compare generated texts with train data, we extract all 4-grams of train data and generated texts. On average, for each generated text, less than of 4-grams can be found in all 4-grams of train data on all datasets. This shows that there exists some similarity and our model can also generate texts with different words combinations. To compare generated texts with each other, we suppose that if over of 4-grams of one generated text don’t exist at the same time in any one of the other generated texts, the text is one unique text. We observe more than of generated texts are unique. This proved that the generated texts are diverse.
Adversarial Examples. We show some valid adversarial examples generated by our method in Figure 6. We can view that the adversarial examples generated by the vanilla VAE is more likely neutral, and the confidence of the targeted model is not huge. On the contrary, the generated examples of our method have high confidence of the targeted model. This shows is important to attack success rate. Besides, the fluency and validity of texts generated by our method are acceptable.
4.4 Ablation Study
In this section, we further demonstrate the effectiveness of discrimators. We now report the ablation study.
We first remove discrimators and , then train our model. We compare it with the model trained with in a min-max game. We evaluate their attack success rate, perplexity and validity. Results are show in Table 3.
The attack success rates of models trained with and without are close. But the validity of the model trained without is much lower than that of the model with . The reason of this phenomenon is as follows. When training the generator with only and , suppose we want to generate positive adversarial texts and the targeted model must classify it as negative, the easiest way to achieve this goal is to change a few words in the generated text to negative words, such as "bad". But texts generated this way can not fool humans. If we add discrimators to draw distribution of adversarial texts close to the distribution of real data, this phenomenon can be controlled. This shows that discrimators and the min-max game can improve the validity greatly.
4.5 Defense With Adversarial Training
Using the adversarial examples to augment the training data can make models more robust, this is called adversarial training.
On RT dataset, we randomly generate 4k adversarial texts to augment the training data and 1k to test the model. On IMDB dataset, we randomly generate 10k, of which 8k for training and 2k for testing. Results are shown in Figure 7(a) and Figure 7(b).
Through adversarial data augmentation, test accuracy on the original test data is stable. Also, the accuracy on the adversarial data is improved greatly (from to ). It implies that adversarial training can make models more robust without hurting its effectiveness.
Then, on RT dataset, we first augment training data with adversarial examples generated by pair-wise generation. The adversarial examples are generated through transforming training data. Note that we have 8k training data in RT dataset. When we set bigger , the attack success rate is higher, so we can generate more adversarial examples in the pair-wise way. But with any , unrestricted generation from scratch can result in infinite adversarial data. We compare the adversarial data augmentation performances of pair-wise and unrestricted generation from scratch. We use the same number of adversarial examples generated by the two modes, and hold out of generated data for testing. Results are shown in Figure 7(c).
We can see that with pair-wise generation, if training data is limited, we need to generate more adversarial examples to improve the adversarial test accuracy. Higher adversarial test accuracy requires higher . But higher results in bigger perplexity, which means low text quality. Differently, with unrestricted generation from scratch, we can generate infinite adversarial texts using very small , with high fluency and similar adversarial test accuracy. Thus, under similar adversarial test accuracy, the text fluency of pair-wise generation is worse than that of unrestricted generation from scratch. This indicates the advantage of our proposed method.
In this paper, we have proposed a scalable method to generate adversarial texts from scratch attacking a text classification model. We add an adversarial loss to enforce the generated text to mislead the targeted model. Besides, we use discrimators and GAN-like training strategy to make adversarial texts mimic real data of the desired class. After the generator is trained, it can generate diverse adversarial examples of a desired class on a large scale without real-world texts. Experiments show that the proposed method is scalable and can achieve higher attack success rate at a higher speed compared with recent methods. In addition, it is also demonstrated that the generated texts are of good quality and mostly valid. We further conduct ablation experiments to verify effects of discrimators. Experiments of data augmentation indicate that our method generates more diverse adversarial texts with higher quality than pair-wise generation, which can make the targeted model more robust.
-  Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang, ‘Generating natural language adversarial examples’, arXiv preprint arXiv:1804.07998, (2018).
-  Samuel Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy Bengio, ‘Generating sentences from a continuous space.’, in Proceedings of the Twentieth Conference on Computational Natural Language Learning (CoNLL)., (2016).
Peter F Brown, Vincent J Della Pietra, Robert L Mercer, Stephen A Della Pietra, and Jennifer C Lai, ‘An estimate of an upper bound for the entropy of english’,Computational Linguistics, 18(1), 31–40, (1992).
-  Erik Cambria, ‘Affective computing and sentiment analysis’, IEEE Intelligent Systems, 31(2), 102–107, (2016).
-  Nicholas Carlini and David Wagner, ‘Towards evaluating the robustness of neural networks’, in 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE, (2017).
-  Minhao Cheng, Jinfeng Yi, Huan Zhang, Pin-Yu Chen, and Cho-Jui Hsieh, ‘Seq2sick: Evaluating the robustness of sequence-to-sequence models with adversarial examples’, arXiv preprint arXiv:1803.01128, (2018).
-  Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio, ‘On the properties of neural machine translation: Encoder-decoder approaches’, arXiv preprint arXiv:1409.1259, (2014).
-  Ari Firmanto, Riyanarto Sarno, et al., ‘Prediction of movie sentiment based on reviews and score on rotten tomatoes using sentiwordnet’, in 2018 International Seminar on Application for Technology of Information and Communication, pp. 202–206. IEEE, (2018).
-  Zhitao Gong, Wenlu Wang, Bo Li, Dawn Song, and Wei-Shinn Ku, ‘Adversarial texts with gradient methods’, arXiv preprint arXiv:1801.07175, (2018).
-  Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy, ‘Explaining and harnessing adversarial examples’, arXiv preprint arXiv:1412.6572, (2014).
-  Weiwei Hu and Ying Tan, ‘Generating adversarial malware examples for black-box attacks based on gan’, arXiv preprint arXiv:1702.05983, (2017).
-  Eric Jang, Shixiang Gu, and Ben Poole, ‘Categorical reparameterization with gumbel-softmax’, arXiv preprint arXiv:1611.01144, (2016).
Yoon Kim, ‘Convolutional neural networks for sentence classification’, inProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751, (2014).
-  Diederik P Kingma and Max Welling, ‘Auto-encoding variational bayes’, arXiv preprint arXiv:1312.6114, (2013).
-  J Li, S Ji, T Du, B Li, and T Wang, ‘Textbugger: Generating adversarial text against real-world applications’, in 26th Annual Network and Distributed System Security Symposium, (2019).
Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, and Wenchang Shi,
‘Deep text classification can be fooled’, in
Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 4208–4215. AAAI Press, (2018).
-  Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts, ‘Learning word vectors for sentiment analysis’, in Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1, pp. 142–150. Association for Computational Linguistics, (2011).
-  Paul Michel, Xian Li, Graham Neubig, and Juan Miguel Pino, ‘On evaluation of adversarial perturbations for sequence-to-sequence models’, arXiv preprint arXiv:1903.06620, (2019).
-  Melody Moh, Abhiteja Gajjala, Siva Charan Reddy Gangireddy, and Teng-Sheng Moh, ‘On multi-tier sentiment analysis using supervised machine learning’, in 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), volume 1, pp. 341–344. IEEE, (2015).
-  Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard, ‘Deepfool: a simple and accurate method to fool deep neural networks’, in , pp. 2574–2582, (2016).
-  Juan Antonio Morente-Molinera, Gang Kou, Konstantin Samuylov, Raquel Ureña, and Enrique Herrera-Viedma, ‘Carrying out consensual group decision making processes under social networks using sentiment analysis over comparative expressions’, Knowledge-Based Systems, 165, 335–345, (2019).
-  Bo Pang and Lillian Lee, ‘Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales’, in Proceedings of the 43rd annual meeting on association for computational linguistics, pp. 115–124. Association for Computational Linguistics, (2005).
-  Bo Pang, Lillian Lee, et al., ‘Opinion mining and sentiment analysis’, Foundations and Trends® in Information Retrieval, 2(1–2), 1–135, (2008).
Nicolas Papernot, Patrick McDaniel, Ananthram Swami, and Richard Harang, ‘Crafting adversarial input sequences for recurrent neural networks’, inMILCOM 2016-2016 IEEE Military Communications Conference, pp. 49–54. IEEE, (2016).
-  Tapani Raiko, Mathias Berglund, Guillaume Alain, and Laurent Dinh, ‘Techniques for learning binary stochastic feedforward neural networks’, arXiv preprint arXiv:1406.2989, (2014).
-  Shuhuai Ren, Yihe Deng, Kun He, and Wanxiang Che, ‘Generating natural language adversarial examples through probability weighted word saliency’, in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1085–1097, (2019).
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra, ‘Stochastic backpropagation and approximate inference in deep generative models’, inInternational Conference on Machine Learning, pp. 1278–1286, (2014).
-  Suranjana Samanta and Sameep Mehta, ‘Towards crafting text adversarial samples’, arXiv preprint arXiv:1707.02812, (2017).
-  Motoki Sato, Jun Suzuki, Hiroyuki Shindo, and Yuji Matsumoto, ‘Interpretable adversarial perturbation in input embedding space for text’, arXiv preprint arXiv:1805.02917, (2018).
-  Congzheng Song and Vitaly Shmatikov, ‘Fooling ocr systems with adversarial text images’, arXiv preprint arXiv:1802.05385, (2018).
-  Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus, ‘Intriguing properties of neural networks’, arXiv preprint arXiv:1312.6199, (2013).
Prayag Tiwari, Brojo Kishore Mishra, Sachin Kumar, and Vivek Kumar, ‘Implementation of n-gram methodology for rotten tomatoes review dataset sentiment analysis’,International Journal of Knowledge Discovery in Bioinformatics (IJKDB), 7(1), 30–41, (2017).
-  Chaowei Xiao, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, and Dawn Song, ‘Spatially transformed adversarial examples’, arXiv preprint arXiv:1801.02612, (2018).
-  Wei Emma Zhang, Quan Z Sheng, A Alhazmi, and C Li, ‘Adversarial attacks on deep learning models in natural language processing: A survey’, arXiv preprint arXiv:1901.06796, (2019).