In recent years the fast development of deep learning (DL) brought great changes to techniques in many fields. Deep Neural Networks (DNNs) have been extensively applied in many fields, such as image recognition , computer vision , and natural language analysis . In fact, DNNs achieve even human-competitive performance in many fields. Meanwhile the security problem of DNNs has become a critical topic.
Several recent studies [10, 11] demonstrate that some artificial perturbations can easily make DNN-based image or audio classifiers misclassify. Szegedy C. et al  first revealed the sensitivity to artificial perturbations. Specifically, the state-of-the-art GoogLeNet would misclassify the adversarial images generated by ”fast gradient sign” algorithm , while a human observer can still classify correctly almost without noticing the artificial perturbations. These studies revealed the fact that the adversaries could potentially fool the state-of-the-art DNNs by crafting perturbations. Additionally, some researchers even investigated adversarial images created under extremely general or limited scenarios, such as universal adversarial perturbations  and one-pixel attack  for DNNs.
In the domain of text processing, DNN-based Natural Language Processing (NLP) could learn non-linear models, overcoming traditional NLP’s linear model. Moreover, deep learning learns the language features itself without extracting, achieving higher precision. However, a recent study has revealed that artificial perturbations could also make DNN-based text classifiers misclassify. Unfortunately, DNNs for natural language processing have not got the attention they deserve and until recently previous attacks did not propose an effective algorithm for generating adversarial texts. Bin Liang et al. demonstrated that the text classification done with DNNs can also be attacked similarly to image or audio classification DNNs . They successfully crafted adversarial samples for DNN-based natural language text classifiers.
Here, we propose a technique that aim to overcome some of the limitations of previous ones. First, in  for locating Hot Training Phrases (HTPs) or Hot Sample Phrases (HSPs), the training dataset, features of data, dimensions of the model, classification items and some other information are necessary when crafting the adversaries. However, in practice, the conditions and data necessary are mostly unavailable. Second, there is a strong dependence on gradients which are not always available, i.e., in  if the cost gradients could not be computed, it is hard if not impossible to get the HSPs.
In this paper, we propose a technique that can craft adversarial samples for a general black-box scenario. In fact, our proposed method creates a universal rule that can create adversarial samples automatically, i.e., no search is necessary (Fig. 1 shows an overview of adversarial text samples crafted by a universal rule). Since the pertubation only change a few letters of a phrase, this attack is almost imperceptible. The novelty of this work lies in proposing:
Universal Rule - The current approach is the first to create an automatic rule pattern that can process samples rapidly and output adversarial samples. This goes beyond universal perturbations to create yet another layer of abstraction which allows pertubations to change depending on the sample.
Comparing to previous works our proposal has the following main advantages.
Automatic (Universal Perturbation) - No need for searching for adversarial samples. The creation of an adversarial sample is done by a rule that changes some letters according to a learned pattern.
Non-Gradient Method - The proposed method does not compute or need to compute cost gradients. To create a universal rule, a novel black-box search algorithm is employed.
Imperceptible - Every sample phrase can only be perturbated five times or less. Thus the resulting sample is imperceptible to human observers.
Ii Related Work
With the development of DL, DNNs have been widely applied in many fields an therefore DNNs’ security problem came to be of utmost importance. There have been many works investigating the security of DNNs as well as identifying its vulnerabilities by proposing several attack methods , including black-box attacks  and the white-box attack . Various methods and algorithms are proposed to generate adversarial samples, including gradient-based (e.g. ”fast gradient sign” algorithm proposed by I.J.Goodfellow et al.) [10, 11, 19], greedy approaches (e.g. greedy perturbation searching method proposed by S.M. Moosavi-Dezfooli et al.) [21, 17] , and evolution-based (e.g. one-pixel attack proposed by Su Jiawei et al.) [19, 28].
Some researchers consider that the state-of-the-art deep neural networks are highly vulnerable to gradient-based methods which is easy to use as well. For instance, in recent years Moosavi D. et al. proposed systematic algorithm  for computing universal perturbations which caused natural images to be misclassified with high probability. In addition, Su Jiawei et al. proposed one-pixel attack by using a differential evolution to search under an extreme limited scenario. Therefore, there are many types of optimization methods which result in high misclassified rates for DNN-based image classifiers.
Unfortunately, there are no studies paying attention to methods or algorithms for generating universal perturbations against DNN-based text classification. Text as a discrete data is also sensitive to perturbation, however, the geometric correlations among the high-dimensional decision boundary of classifiers couldn’t be found in text data, so the existing algorithms for generating adversarial images cannot be directly applicable for text. Recently, Bin L. et al. proposed the first attack for deep text classification. Additionally, Bin Liang et al. demonstrated that since text is a kind of discrete data, when directly adopting existing image or audio perturbation algorithms the resulting text sample may lose its original meaning or even become meaningless for human observers . Thus, in order to craft imperceptible adversarial text samples without losing their original meaning, they presented three perturbation strategies: insertion, modification, and removal. To craft them, they used the cost gradients for original text and training samples to generate adversarial samples.
In order to maintain the meaning of a text sample, they perturbated the sample by directly modifying its words, inserting new items (words or sentences) or removing some original ones from it. First, for all training samples the cost gradients of every dimension in all character vectors are calculated. They termed phrases with significant cost gradients to the current classification as HSPs. Additionally, the most frequent phrases in all training samples of the target classification are termed as HTPs. For insertion strategy, HTPs form target classifications are inserted into the text samples nearby phrases with significant contribution to the original class which result in the increase in confidence of target class and decline in original classification confidence. In modification and removal perturbations, HSPs for original classification are modified or removed which could generate the drop off in original confidence. Fig. 2 presents an example of the three proposed perturbation strategies.
Iii Target Model and Settings
Convolutional Neural Networks (CNNs) which is typically used in computer vision can also be applied to problems in Natural Language Processing and perform quite well. Location invariance and local compositionality made intuitive sense for images but not so much for NLP. Considering all this, it seems like CNNs wouldn’t be a good fit for NLP tasks. However, CNNs are fast and efficient in terms of NLP tasks as well because they can extract relationships from words and sequences.
A recent study investigates the use of CNNs to learn directly from characters without the need for any pre-trained embeddings . However, results show that learning directly from character-level input works very well on large datasets (millions of samples) but underperforms in simpler models on smaller datasets (hundreds of thousands of samples). Therefore, we apply a common CNN for text classification . For model training and evaluation, we use the same as in  which employs the Movie Review data from Rotten Tomatoes. This dataset contains
movie review sentences, half positive and half negative. Positive and negative sentences are loaded from the raw data files and cleaned for feeding input texts into the network. Additionally, instead of using the pre-trained word2vec vectors for word embeddings, this model directly learn embeddings from scratch. In other words, the first layer words are embedded into low-dimensional vectors and afterwards the next layer performs convolutions over the embedded word vectors using multiple filter sizes. This is followed by a max-pooling over time and a last layer of fully-connected neurons with dropout regularization which outputs in a softmax layer (Figure 3 shows the models for the two types of DNN models used).
Iv Universal Rule
The proposed method is both a random search and a coevolutionary optimization algorithm for generating universal rules that can create adversarial samples for DNN-based natural language text classifier . Before introducing the optimization algorithm, the three perturbation procedures of which the universal rule is consisted of are explained in detail.
The universal rule is made of a sequence of perturbation procedures which can be either swapping, deletion or insertion procedures (the whole procedure is illustrated and explained in Fig. 7). Each of these procedures are described in detail below. The universal rule itself is made of many perturbation procedures in sequence. However, that does not mean a text will be perturbed many times because there is a limit to the number of pertubations set to five. Note that although this limit is set, in a given sample most of the time there will be three or less perturbations. This happens because it is hard to find matching letters for each rule. In fact, we found that three perturbation methods performed not well in practice when used separately. Using only one such perturbation for attacking could get fooling rate at most because many of the rules cannot find a pattern that matches them. Therefore they fail to modify the sample not mention make it misclassify. Thus, we use or more different perturbations to make up a universal rule.
Iv-a Swapping Perturbation Procedure
A swapping perturbation procedure is defined by two letters. These are the letters that will be searched for among words in the original text sample. Once it is located in a word, the two letters in the word will be swapped and the word becomes a slightly misspelled word. The results of our test show that even such imperceptible change could make the DNN text classifier misclassify.
For example, Fig. 4 shows a text sample classified as positive review class. Swapping letters of the word in the sample just one time, the perturbed sample is classified as negative review class. However, for human observers, we still can recognize the text as a positive one and even know the misspelled word is rock.
Iv-B Deletion Perturbation Procedure
The deletion perturbation procedure deletes a letter from matched word in a sample. This procedure is defined by two letters. Given these two letters, the procedure searches for them over the text and once it founds a match the second letter is deleted from the word. Fig. 5 shows an example, the deletion perturbation is [o, o], after deleting the second o the text’s classification changed from negative review to positive review.
Iv-C Insertion Perturbation Procedure
The insertion perturbation procedure perturb the classification probability by inserting a letter into a word. The misspelled word might lead to the decrease of the original class confidence or the increase of the miss-class confidence. This procedure consists of three letters which works by searching for the first two letters in a word over the text. When a match is found the third letter is inserted after them. In Fig. 6 an example is presented in which two letters i and l are located in the word film, then it is perturbed into a misspelled word filam and results in misclassification.
V Universal Rule Evaluation
To develop universal rules it is necessary to create some ways to evaluate them. Here, we propose two types of fitness function. One based on accuracy and the other based on utility and therefore the names: accuracy fitness and utility fitness.
Regarding the accuracy fitness, it is defined as the success rate of the attack from all universal rules created with the individual. Notice that this measure also depends on the sample and the other individuals that take part on the universal rule set. Thus, given the attack success of a given sample ( when the class changed and otherwise), the accuracy fitness can be obtained by:
where is the number of times a given individual is evaluated. Notice that may vary from individual to individual since they are randomly picked each time a universal rule is created.
Regarding the utility fitness, universal rules should perturbate the sample a given number of times. Although the maximum number of times that it can perturbate is set, there is no guarantee that it will perturbate this number of times. Moreover, many individuals which code perturbation procedures fail to perturbate in most of the samples. To avoid inactive individuals, a utility fitness is defined in which the value is the number of times an individual perturbate divided by the number of times it was chosen to participate in a universal rule. Therefore, considering the utility fitness and a variable that is either when sample is perturbated or otherwise, the following equation defines it explicitly:
Vi Random Search for Universal Rule Optimization
In this paper, we propose two methods to develop universal rules. The first one is a simple method called random search for universal rule optimization (RS). The method consists of a variation of a random search procedure in which the best universal rule found is stored and returned as the output. Specifically, the individuals of the population are made of perturbation procedures of the types described in Section IV. In each generation, first a new set of individuals is generated. Afterwards, universal rules are created by combining the individuals into sequences. Lastly, new individuals are created by mixing the individuals of the population using a differential operator like rule. Since, the initial individuals are randomly spreaded inside the hypercube of possible perturbation procedures, the differential evolution operator will create random walks in this space.
Vii Coevolutionary Algorithm for Universal Rule Optimization
Here we propose the Coevolutionary Algorithm for Universal Rule Optimization (CAURO). The aim of the method is to find universal rules efficiently based on the combination of useful and accurate small perturbations rules. Since the number of perturbation procedures is not fixed as well as the order and permutation of these perturbation procedures are also as important as the perturbation procedures themselves, coevolution seems to be a good match. The objective here is to focus more on the combination of good rules rather than on creating one. Moreover, we hypothesize that the optimization landscape for searching for universal rules is not a well behaved space to search, since good universal rules might be far away from each other with many less good solutions around.
The algorithm consist of generating perturbation procedures to compose a population at first and then randomly picking individuals to compose universal rules. Each time a universal rule is picked, it is evaluated in the dataset and have its constituting individuals update both its accuracy and utility fitness. This evaluation process is illustrated in Figure 8. After a universal rule is created many times, the algorithm will rank individuals by accuracy and utility fitness. This is followed up by a simple selection process in which of the individuals with lowest accuracy fitness and of the individuals with lowest utility fitness are replaced by new random generated individuals (Figure 9). This composes the new population for the next generation.
Viii Evaluation and Results
In this section, we evaluate the effectiveness of the proposed attack method by comparing the rate of misclassification of the best universal rule generated per generation. The experiments are conducted on the movie review dataset from rotten tomatoes with two different DNN models. Moreover, if most of the perturbation procedures in a universal rule are able to perturbate a text sample, they may modify a sample too many times. This might cause the text sample to result in a meaningless sentence even for human observers. In fact, since such an attack alters the content of the text sample it is considered a failed attack. To avoid this problem, we decide that a sample can only be modified five times at most by a universal rule.
In Figure 10, the misclassification rates of the best generated universal rules by RS and CAURO are shown. After generations, the best misclassification rate of RS reached while CAURO achieved . In other words, CAURO creates universal rules which can fool DNNs with higher accuracy than RS. Moreover, with the increase in the number of perturbation procedures, misclassification rates continues to increase. Notice that although the number of perturbation procedures in one universal rule increases, the maximum number of perturbation is fixed and kept at five per sample for all tests. The performance of the proposed method CAURO is clearly superior to RS. This shows that selection for better perturbation procedures as well as the utility and accuracy fitness are important to find better universal rules. In Figure 11, additional results are shown for the same dataset but with a different DNN (DNN-2 is used, for the description of the model please refer to Figure 3). This time the performance decreases but it is still capable of achieving more than success rate after 100 generations.
Regarding the number of perturbations per rule, the reason for the low misclassification rates when universal rules are small ( or perturbations in one rule) can only be justified by the fact that matching rules is a difficult process. In fact, by looking at the data for the perturbations in one rule case, we verified that even with perturbations CAURO’s crafted universal rules only modify of the test samples with on average of the text samples being modified five times. Therefore, even for the case in which more perturbations are encoded in one universal rule, of the text samples are still perturbated less than five times. In other words, the experiments show that rules will often fail to match the text sample, resulting in a non perturbated one. Moreover, with few perturbation procedures in one rule (e.g., or perturbations in one rule), matching rates possibly becomes more important than misclassifying rate. This happens because every single perturbation procedure that matches and perturbs, increase the misclassification rate. Consequently, individuals encoding perturbation procedures with rare but accurate perturbations should not survive in the such populations.
Thus, it is possible to create universal rules that can create adversarial samples without the need to search for them. Actually, and success rate might not seem much at first glance. However, this is not an adversarial attack success rate but rather a universal rule success rate. This means that once an universal rule is found, no search is necessary for a given sample to become an adversarial sample because they are generated by just applying a simple universal rule made of a sequence of perturbation procedures. We point out that the strong representation power of state-of-the-art neuroevolution methods using unified neural models  and the adaptiveness of self-organizing classifiers (which can adapt to changes in mazes similar to rats)  and MAP elites (which can adapt to malfunctions).
Ix Conclusion and Discussion
Previous research has shown that DNN-based text classification is also vulnerable to the gradient-based adversarial samples. In this paper, we show the existence of universal rules (perturbations created from rules which are sample agnostic) that can fool state-of-the-art text classifiers.
In summary, this paper has the following main achievements:
Universal Rules - We have shown that it is possible to generate universal rules that are sample agnostic, i.e., rules that can create adversarial samples without any search and independent of the sample given.
CAURO - The proposition of a coevolutionary algorithm (CAURO) for generating universal rules efficiently. In fact, it is the first time that a coevolutionary algorithm is applied to adversarial machine learning.
The results achieved here should impact new adversarial attacks as well as their defenses. CAURO can be extended to other types of input such as images as well as can incorporate other types perturbation procedures, e.g. the repetition of words in the text or even more complex forms of perturbation. Adversarial samples used here can be used to investigate the reason of the attacks and their respective defenses. Moreover, we expect our work to also incentivate new methods that could themselves, without any kind of specifically designed defenses, overcome the current limitations.
This work was supported by JST, ACT-I Grant Number JP-50166, Japan.
-  Barreno, M.; Nelson, B.; Sears, R.; Joseph, A.; and Tygar, J. Can machine learning be secure?. 2006. In ASIACCS 2006.
-  Bin, L.; Hongcheng, L.; Miaoqiang, S.; Pan, B.; Xirong, L.; Wenchang, S. 2017. ”Deep Text Classification Can be Fooled.” arXiv:1704.08006 (2017).
Britz Denny. ”Implementing a cnn for text classification in tensorflow.” (2015).http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/
C. dos Santos and M. Gatti. Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 69?78, Dublin, Ireland, August 2014. Dublin City University and Association for Computational Linguistics.
-  Carlini, N.; and Wagner, D. 2016. Towards evaluating the robust- ness of neural networks. arXiv preprint arXiv: 1608.04644.
-  Collobert, R.; and Weston, J. 2008. A unified architecture for natural language processing: Deep neural networks with task learning. In ICML 2008.
-  Dahl, G. E.; Yu, D.; Deng, L.; and Acero, A. 2012. Context- dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE TASLP, 20(1): 30?42.
-  de Melo, V. V., Vargas, D. V., Crocomo, M. K., Delbem, A. C. B. (2014). Phylogenetic differential evolution. In Natural Computing for Simulation and Knowledge Discovery (pp. 22-40). IGI Global.
-  Gao, Ji, et al. ”Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers.” arXiv preprint arXiv:1801.04354 (2018).
-  Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2015. Explaining and harnessing adversarial examples. In ICLR 2015.
-  Kereliuk, C.; Sturm, B.; and Larsen, J. 2015. Deep learning and music adversaries. IEEE TMM, 17(11): 2059-2071.
Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In NIPS 2012.
-  Kuleshov, Volodymyr, et al. ”Adversarial Examples for Natural Language Classification Problems.” (2018).
-  Lehmann, J.; Isele, R.; Jakob, M.; Jentzsch, A.; Kontokostas, D.; Mendes, P. N.; Hellmann, S.; Morsey, M.; Kleef, P.; Auer, S.; and Bizer, C. 2014. DBpedia - a large-scale
-  Liang, B.; Su, M.; You, W.; Shi, W.; and Yang, G. 2016. Crack- ing classifiers for evasion: a case study on the Google?s phishing pages filter. In WWW 2016.
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; and Dean, J. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NIPS 2013.
-  Moosavi-Dezfooli,S.-M.; Fawzi, A.; and Frossard, P. 2016. DeepFool: a simple and accurate method to fool deep neural net- works. In CVPR 2016.
Moosavi, D.; Seyed, M.; Alhussein, F.; Omar, F.; Pascal, F. 2017. ”Universal adversarial perturbations.” Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). No. EPFL-CONF-226156. 2017.
-  Nguyen, A.; Yosinski, J.; and Clune, J. 2015. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In CVPR 2015.
-  Papernot, N.; McDaniel, P.; Goodfellow, I.; Jha, S.; Celik, Z. B.; and Swami, A. 2016a. Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples. arXiv preprint arXiv:1602.02697.
-  Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.; Celik, Z. B.; and Swami, A. 2016b. The limitations of deep learning in adversarial settings. In IEEE EuroSP 2016.
-  Papernot, N.; McDaniel, P.; Wu, X.; Jha, S.; and Swami, A. 2016c. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE SP 2016.
-  R. Johnson and T. Zhang. Effective use of word order for text categorization with convolutional neural networks. CoRR, abs/1412.1058, 2014.
-  R. Storn and K. Price. Differential evolutional simple and efficient heuristic for global optimization over continuous spaces. Journal of global optimization, 11(4):341–359,1997.
-  Shaham, U.; Yamada, Y.; and Negahban, S. 2015. Understanding adversarial training: increasing local stability of neural nets through robust optimization. arXiv preprint arXiv:1511.05432.
-  Laskov, Pavel. ”Practical evasion of a learning-based classifier: A case study.” Security and Privacy (SP), 2014 IEEE Symposium on. IEEE, 2014.
S. Das and P. N. Suganthan. Differential evolution: A survey of the state-of-the-art. IEEE transactions on evolutionary computation, 15(1):4–31, 2011.
-  Su Jiawei, Danilo Vasconcellos Vargas, and Sakurai Kouichi. ”One pixel attack for fooling deep neural networks.” arXiv preprint arXiv:1710.08864 (2017).
-  Suchanek, F.; Kasneci, G; and Weikum, G. 2007. YAGO: a core of semantic knowledge unifying wordnet and Wikipedia. In WWW 2007.
-  Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; and Rabinovich, A. 2015. Going deeper with convolutions. In CVPR 2015.
-  Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; and Fergus, R. 2014. Intriguing properties of neural networks. In ICLR 2014.
-  Vargas, D.V. and Murata, J., 2017. Spectrum-diverse neuroevolution with unified neural models. IEEE transactions on neural networks and learning systems, 28(8), pp.1759-1773
-  Vargas, D.V., Takano, H. and Murata, J., 2013, July. Self organizing classifiers and niched fitness. In Proceedings of the 15th annual conference on Genetic and evolutionary computation (pp. 1109-1116). ACM..
-  Whittaker, C.; Ryner, B.; and Nazif, M. 2010. Large-scale auto- matic classification of phishing pages. In NDSS 2010.
-  Zhang, X.; Zhao, J.; and LeCun, Y. 2015. Character-level convolutional networks for text classification. In NIPS 2015.