Adversarial attack on deep neural networks (DNNs) aims to slightly modify the inputs of DNNs and mislead them to make wrong predictions [1, 2]. This task has become a common approach to evaluate the robustness of DNNs [3, 4, 5] – generally speaking, the easier an adversarial example can be generated, the less robust the DNN model is. However, models designed for different tasks are not born equal: some tasks are strictly harder to attack than others. For example, attacking an image is much easier than attacking a text string, since image space is continuous and the adversary can make arbitrarily small changes to the input. Therefore, even if most of the pixels of an image have been modified, the perturbations can still be imperceptible to humans when the accumulated distortion is not large. In contrast, text strings live in a discrete space, and word-level manipulations may significantly change the meaning of the text. In this scenario, an adversary should change as few words as possible, and hence this limitation induces a sparse constraint on word-level changes. Likewise, attacking a classifier should also be much easier than attacking a model with sequence outputs. This is because different from the classification problem that has a finite set of discrete class labels, the output space of sequences may have an almost infinite number of possibilities. If we treat each sequence as a label, a targeted attack needs to find a specific one over an enormous number of possible labels, leading to a nearly zero volume in search space. This may explain why most existing works on adversarial attack focus on the image classification task, since its input space is continuous and its output space is finite.
In this paper, we study a harder problem of crafting adversarial examples for sequence-to-sequence (seq2seq) models . This problem is challenging since it combines both aforementioned difficulties, i.e., discrete inputs and sequence outputs with an almost infinite number of possibilities. We choose this problem not only because it is challenging, but also because seq2seq models are widely used in many safety and security sensitive applications, e.g., machine translation [7, 8], text summarization , and speech recognition , thus measuring its robustness becomes critical. Specifically, we aim to examine the following questions in this study:
Is it possible to slightly modify the inputs of seq2seq models while significantly change their outputs?
Are seq2seq models more robust than the well-evaluated CNN-based image classifiers?
We provide an affirmative answer to the first question by developing an effective adversarial attack framework called Seq2Sick. It is an optimization-based framework that aims to learn an input sequence that is close enough to the original sequence while leads to the desired outputs with high confidence. To address the challenges caused by the discrete input space, we propose to use the projected gradient descent method combined with group lasso and gradient regularization. To address the challenges of almost infinite output space, we design some novel loss functions for the tasks of non-overlapping attack and targeted keyword attack. Our experimental results show that the proposed framework yields high success rates in both tasks. However, even if the proposed approach can successfully attack seq2seq models, our answer to the second question is “Yes”. Compared with CNN-based classifiers that are highly sensitive to adversarial examples, seq2seq model is intrinsically more robust since it has discrete input space and the output space is exponentially large. As a result, adversarial examples of seq2seq models usually have larger distortions and are more perceptible than the adversarial examples crafted for CNN-based image classifiers.
In summary, our main contributions are as follows:
We propose Seq2Sick, a novel optimization-based approach to craft adversarial sequences to fool word-level seq2seq models. Seq2Sick provides two types of attacks: non-overlapping attack to change all the words in the output sentence, and targeted keyword attack to insert malicious keywords into the output.
To handle discrete input space, we propose a modified projected gradient-based method with group lasso regularization to enforce the sparsity in terms of the number of changed words. We also define corresponding loss functions on output sequence for each type of attack.
We conduct experiments on text summarization and machine translation tasks. By changing less than 3 words in input sequences, we can fool seq2seq models to produce malicious outputs with high success rates. On the other hand, we recognize that seq2seq models are more robust to adversarial examples than the CNN-based image classifiers.
2 Related Work
Most existing work on adversarial attack focuses on CNN-based models, while only a few studies discuss how to attack RNN-based methods. In this section, we briefly review CNN-based attacks, and then highlight the differences between our work and previous attack algorithms designed for RNN-based models.
2.1 Adversarial Attacks on CNN-based Models
 first discovered adversarial examples of the CNN in the context of image classification. Since then, a number of approaches have surfaced regarding adversarial attacks on CNNs, such as using gradient-based methods [2, 11, 12, 13, 14, 15], score-based methods [16, 17, 18], transfer-based methods , and decision-based methods . In addition to image classification, attacks to other CNN-related tasks have also been actively investigated. Typical examples include semantic segmentation and object detection [21, 22, 23, 24, 25], image captioning , and visual QA . Besides, recent studies have also conducted adversarial attacks in the real-world scenarios, including attacking road signs [28, 29, 24, 25], cell-phone cameras , robots , and generating 3D-printed adversarial objects .
2.2 Adversarial Attacks on RNN-based Models
Although lots of works have been done to fool CNN-based models, only a few studies have examined RNN-based models, and the majority of them focus on simple text classification problems.  first uses Fast Gradient Sign Method (FGSM) to conduct an attack on RNN/LSTM-based classification problems. In order to generate text adversarial examples, 
proposes to use reinforcement learning to locate important words that could be deleted in sentiment classification. and  generate adversarial sequences by inserting or replacing existing words with typos and synonyms.  aims to attack sentiment classification models in a black-box setting. It develops some scoring functions to find the most important words to modify. These approaches differ from our work in that they study simple text classification problems while we focus on the more challenging seq2seq model with sequential outputs. Other than attacking text classifiers,  aims to fool reading comprehension systems by adding misleading sentences, which has a different focus than ours.  uses the generative adversarial network (GAN) to craft natural adversarial examples. However, it can only perform the untargeted attack and also suffers from high computational cost.
Our work shares some similarities with  as it also studies the problem of attacking seq2seq model. However,  is less appealing than our work since (i) they only focus on untargeted attack while our work can handle the more challenging targeted attack, and (ii) their word-level attack method does not work well – it only yields a 22% success rate for the untargeted attack. Indeed, they recognize that attacking word-level seq2seq model is a challenging problem, which has been addressed by this paper.
|Methods||Gradient Based?||Word-level RNN?||Sequential Output?||Targeted Attack?|
|Ebrahimi et al ||Class|
|Li et al ||Class|
|Papernot et al |
|Gao et al ||Binary|
|Zhao et al ||/|
|Liang et al ||Class|
Notably, almost all the previous methods are based on greedy search, i.e., at each step, they search for the best word and the best position to replace the previous word. As a result, their search space grows rapidly as the length of input sequence increases. To address this issue, we propose a novel approach that uses group lasso regularization and the projected gradient descent method with gradient regularization to simultaneously search all the replacement positions. Table 1 summarizes the key differences between the proposed framework Seq2Sick and the existing attack methods on RNN-based models.
3.1 A Revisit to Sequence-to-Sequence Model
be the embedding vector of each input word,be the input sequence length, and be the output sequence length. Let be the input vocabulary, and the output word where is the output vocabulary. The seq2seq model has an encoder-decoder framework that aims at mapping an input sequence of vectors to the output sequence . Its encoder first reads the input sequence, then each RNN/LSTM cell computes , where is the current input, and represent the previous and current cells’ hidden states, respectively. The next step computes the context vector using all the hidden layers of cells , i.e , where could be a linear or non-linear function. In this paper, we follow the setting in  that .
Given the context vector and all the previously words , the decoder is trained to predict the next word . Specifically, the -th cell in the decoder receives its previous cell’s output and the context vector , and then outputs
where is another RNN/LSTM cell function. is a vector of the logits for each possible word in the output vocabulary .
3.2 Proposed Framework
The problem of crafting adversarial examples against the seq2seq model can be formulated as the following optimization problem:
where indicates the regularization function to measure the magnitude of distortions. is the loss function to penalize the unsuccessful attack and it may take different forms in different attack scenarios. A common choice for is the penalty , but it is, as we will show later, not suitable for attacking seq2seq model. is the regularization parameter that balances the distortion and attack success rate – a smaller will make the attack more likely to succeed but with the price of larger distortion.
In this work, we focus on two kinds of attacks: non-overlapping attack and targeted keywords attack. The first attack requires that the output of the adversarial example shares no overlapping words with the original output. This task is strictly harder than untargeted attack, which only requires that the adversarial output to be somewhat different from the original output [39, 40]. We ignore the task of untargeted attack since it is too trivial for the proposed framework, which can easily achieve a 100% attack success rate. Targeted keywords attack is an even more challenging task than non-overlapping attack. Given a set of targeted keywords, the goal of targeted keywords attack is to find an adversarial input sequence such that all the keywords must appear in its corresponding output. In the following, we respectively introduce the loss functions developed for the two attack approaches.
3.2.1 Non-overlapping Attack
To formally define the non-overlapping attack, we let be the original output sequence, where denotes the location of the -th word in the output vocabulary . indicates the logit layer outputs of the adversarial example. In the non-overlapping attack, the output of adversarial example should be entirely different from the original output , i.e.,
which is equivalent to
Given this observation, we can define a hinge-like loss function to generate adversarial examples in the non-overlapping attack, i.e.,
where denotes the confidence margin parameter. Generally speaking, a larger will lead to a more confident output and a higher success rate, but with the cost of more iterations and longer running time.
We note that non-overlapping attack is much more challenging than untargeted attack, which suffices to find a one-word difference from the original output [39, 40]. We do not take untargeted attack into account since it is straightforward and the replaced words could be some less important words such as “the” and “a”.
3.2.2 Targeted Keywords Attack
Given a set of targeted keywords, the goal of targeted keywords attack is to generate an adversarial input sequence to ensure that all the targeted keywords appear in the output sequence. This task is important since it suggests adding a few malicious keywords can completely change the meaning of the output sequence. For example, in the machine translation task that translates from English into German, an input sentence “policeman helps protesters to keep the assembly in order” should generate an output sentence “Polizist hilft Demonstranten, die Versammlung in Ordnung zu halten”. However, changing only one word from “hilft” to “verhaftet” in the output sequence will significantly change its meaning, as the new sentence means “police officer arrested protesters to keep the assembly in order”. Similarly, if we change the word “warm” to “cold” in the output sentence “people gives a warm welcome to soldiers”, its meaning will also be dramatically changed.
In our method, we do not specify the positions of the targeted keywords in the output sentence. Instead, it is more natural to design a loss function that allows the targeted keywords to become the top-1 prediction at any positions. The attack is considered as successful only when ALL the targeted keywords appear in the output sequence. Therefore, the more targeted keywords there are, the harder the attack is. To illustrate our method, we start from the simpler case with only one targeted keyword . To ensure that the target keyword word’s logit be the largest among all the words at a position , we design the following loss function:
which essentially searches the minimum of the hinge-like loss terms over all the possible locations . When there exist more than one targeted keywords , where denotes the -th word in output vocabulary , we follow the same idea to define the loss function as follows:
However, the loss defined in (5) suffers from the “keyword collision” problem. When there are more than one keyword, it is possible that multiple keywords compete at the same position to attack. To address this issue, we define a mask function to mask off the position if it has been already occupied by one of the targeted keywords:
In other words, if any of the keywords appear at position as the top-1 word, we ignore that position and only consider other positions for the placement of remaining keywords. By incorporating the mask function, the final loss for targeted keyword attack becomes:
3.3 Handling Discrete Input Space
As mentioned before, the problem of “discrete input space” is one of the major challenges in attacking seq2seq model. Let be the set of word embeddings of all words in the input vocabulary. A naive approach is to first learn in the continuous space by solving the problem (2), and then search for its nearest word embedding in . This idea has been used in attacking sequence classification models in . Unfortunately, when applying this idea to targeted keywords attack, we report that all of the 100 attacked sequences on Gigaword dataset failed to generate the targeted keywords. The main reason is that by directly solving the problem (2), the final solution is likely not a feasible word embedding in
, and its nearest neighbor could be far away from it due to the curse of dimensionality.
To address this issue, we propose to add an additional constraint to enforce that belongs to the input vocabulary . The optimization problem then becomes
We then apply a projected gradient descent method to solve this constrained problem. At each iteration, we project the current solution , where denotes the -th column of , back into to make sure that can map to a specific input word.
Group lasso Regularization:
norm has been widely used in the adversarial machine learning literature to measure distortions. However, it is not suitable for our task since almost all the learnedusing regularization will be nonzero. As a result, most of the inputs words will be perturbed to another word, leading to an adversarial sequence that is significantly different from the input sequence.
To solve this problem, we treat each with variables as a group, and use the group lasso regularization
to enforce the group sparsity: only a few groups (words) in the optimal solution are allowed to be nonzero.
3.4 Gradient Regularization
When attacking the seq2seq model, it is common to find that the adversarial example is located in a region with very few or even no embedding vector. This will negatively affect our projected gradient method since even the closest embedding from those regions can be far away.
To address this issue, we propose a gradient regularization to make close to the word embedding space. Our final objective function becomes:
where the third term is our gradient regularization that penalizes a large distance to the nearest point in . The gradient of this term can be efficiently computed since it is only related to one that has a minimum distance from . For the other terms, we use the proximal operator to optimize the group lasso regularization, and the gradient of the loss function can be computed through back-propagation. The detailed steps of our approach, Seq2Sick, is presented in Algorithm 1. Our source code is publicly available.111https://github.com/cmhcbb/Seq2Sick
Our algorithm needs only one back-propagation to compute the gradient . The bottleneck here is to project the solution back into the word embedding space, which depends on the number of words in the input dictionary of the model.  uses word embedding  that contains millions of words to do a nearest neighbor search. Fortunately, our model does not need to use any pre-trained word embedding, thus making it a more generic attack that does not depend on pre-trained word embedding. Besides, we can employ approximate nearest neighbor (ANN) approaches to further speed up the projection step.
In this section, we conduct experiments on two widely-used applications of seq2seq model: text summarization and machine translation.
We use three datasets DUC2003222http://duc.nist.gov/duc2003/tasks.html, DUC2004333http://duc.nist.gov/duc2004/, and Gigaword444https://catalog.ldc.upenn.edu/LDC2003T05, to conduct our attack for the text summarization task. Among them, DUC2003 and DUC2004 are widely-used datasets in documentation summarization. We also include a subset of randomly chosen samples from Gigaword to further evaluate the performance of our algorithm. For the machine translation task, we use 500 samples from WMT’16 Multimodal Translation task555http://www.statmt.org/wmt16/translation-task.html. The statistics about the datasets are shown in Table 2. All our experiments are performed on an Nvidia GTX 1080 Ti GPU.
|Datasets||# samples||Average input lengths|
4.2 Seq2seq models
We implement both text summarization and machine translation models on OpenNMT-py666https://github.com/OpenNMT/OpenNMT-py. Specifically, we use a word-level LSTM encoder and a word-based attention decoder for both applications . For the text summarization task, we use 380k training pairs from Gigaword dataset to train a seq2seq model. The architecture consists of a 2-layer stacked LSTM with 500 hidden units. We conduct experiments on two types of models, one uses the pre-trained 300-dimensional GloVe word embeddings and the other one is trained from scratch. For the machine translation task, we train our model using 453k pairs from the Europal corpus of German-English WMT 15777http://www.statmt.org/wmt15/translation-task.html, common crawl and news-commentary. We use the hyper-parameters suggested by OpenNMT for both models, and have reproduced the performance reported in  and .
4.3 Empirical Results
4.3.1 Text Summarization
For the non-overlapping attack, we use the proposed loss (3) in our objective function. A non-overlapping attack is treated as successful only if there is no common word at every position between output sequence and original sequence. We set in all non-overlapping experiments. Table 3 summarizes the experimental results. It shows that our algorithm only needs to change 2 or 3 words on average and can generate entirely different outputs for more than of sentences. We have also included some adversarial examples in Table 9. More non-overlapping adversarial examples is available at appendix. From these examples, we can only change one word to let output sequence look completely different with the original one and change the sentence’s meaning completely.
For the targeted keywords attack, we randomly choose some targeted keywords from the output vocabulary after removing the stop words like “a” and “the”. A targeted keywords attack is treated as successful only if the output sequence contains all the targeted keywords. We set in our objective function (LABEL:eq:loss) in all our experiments. Table 4 summarizes the performance, including the overall success rate, average BLEU score , and the average number of changed words in input sentences. Average BLEU score is defined by exponential average over BLEU 1,2,3,4, which is commonly used in evaluating the quality of text which has been machine-translated from one natural language to another. Also, we have included some adversarial examples crafted by our method in Table 10. In Table 10, we show some adversarial examples with 3 sets of keywords, where “##” stands for a two-digit number after standard preprocessing in text summarization. Through these examples, we can see that our method could generate totally irrelevant subjects, verbs, numerals and objects which could easily be formed as a complete sentence with only several word changes. More adversarial examples can be found in the appendix of this paper.
Note that there are three important techniques used in our algorithm: projected gradient method, group lasso, and gradient regularization. In the following, we conduct experiments to verify the importance of each of these techniques. First, without using the projected gradient method, we observe that the success rate immediately drops to close to . It is thus important to project the solution back to the input vocabulary’s word embeddings after each iteration. Table 5 shows the experimental results when group lasso or gradient regularization is removed, where “W/O GL” indicates the results without group lasso regularization and “W/O GR” indicates the results without gradient regularization. The algorithms are evaluated using the targeted 2-keyword attack. As shown in Table 5, if we do not use group lasso regularization that enforces group sparsity of distortion , the attack success rate remains similar, but both average BLEU score and the number of changed words become much larger. On the other hand, if we do not use gradient regularization that helps find a better that is close to the input embedding space, not only success rate drops, but also average BLEU score and the number of changed words become worse. These empirical results verify that all the three techniques used here are important in generating adversarial examples against seq2seq models.
|Dataset||Success rate||BLEU||# changed|
|Datasest||Success rate||BLEU||# changed|
|Gigaword||W/o GL||91.4 %||0.166||16.53|
|W/o GR||92.8 %||0.707||4.96|
|W/o GR||87.0 %||0.421||5.14|
4.3.2 Machine Translation
We then conduct both non-overlapping and targeted keywords attacks to the English-German machine translation model. We first filter out stop words like “Ein”(a), “und”(and) in German vocabulary and randomly choose several nouns, verbs, adjectives or adverbs in German as targeted keywords. Similar to the text summarization experiments, we set in our objective function. The success rates, BLEU scores, and the average number of words changed are reported in Table 6, with some adversarial examples shown in Table 8. Furthermore, the effects of group lasso and gradient regularizations are reported in Table 7. These results are consistent with the findings in the text summarization experiments, and verify the effectiveness of proposed attack framework and the importance of the proposed techniques.
|Method||success rate||BLEU||# changed|
|Source input seq||A toddler is cooking with another person.|
|Adv input seq||A dog is sit with another UNK.|
|Source output seq||Ein kleines Kind kocht mit einer anderen Person.|
|Adv output seq||Ein Hund sitzt mit einem anderen UNK.|
4.4 Robustness of Seq2Seq Model
Finally, we summarize our results and make some final remarks about the robustness of seq2seq models. Our algorithm can achieve very good success rates () in both non-overlapping and targeted keywords attacks with 1 or 2 keywords. This clearly verifies the effectiveness of our attack algorithm. On the other hand, we also recognize some strengths of the seq2seq model: (i) unlike CNN models where targeted attack can be conducted easily with almost 100% success rate and very small distortion that cannot be perceived by human eyes 
, it is harder to turn the entire seq2seq output into a particular sentence – some sentences are even impossible to generate by seq2seq models; and (ii) since the input space of seq2seq is discrete, it is easier for human to detect the differences between the adversarial sequence and the original one, even if we only change one or few words. Therefore, we conclude that, compared with the DNN models designed for other tasks such as image classification, seq2seq models are more robust to adversarial attacks. The main reason, as pointed out in the introduction, is that the seq2seq model has a finite and discrete input space and almost infinite output space, so it is more robust than visual classification models that have an infinite and continuous input space and a very small output space (e.g., 10 categories in MNIST and 1,000 categories in ImageNet).
|Source input seq||among asia ’s leaders , prime minister mahathir mohamad was notable as a man with a bold vision : a physical and social transformation that would push this nation into the forefront of world affairs .|
|Adv input seq||among lynn ’s leaders , prime minister mahathir mohamad was notable as a man with a bold vision : a physical and social transformation that would push this nation into the forefront of world affairs.|
|Source output seq||asia ’s leaders are a man of the world|
|Adv output seq||a vision for the world|
|Source input seq||under nato threat to end his punishing offensive against ethnic albanian separatists in kosovo , president slobodan milosevic of yugoslavia has ordered most units of his army back to their barracks and may well avoid an attack by the alliance , military observers and diplomats say|
|Adv input seq||under nato threat to end his punishing offensive against ethnic albanian separatists in kosovo , president slobodan milosevic of yugoslavia has jean-sebastien most units of his army back to their barracks and may well avoid an attack by the alliance , military observers and diplomats say.|
|Source output seq||milosevic orders army back to barracks|
|Adv output seq||nato may not attack kosovo|
|Source input seq||north korea is entering its fourth winter of chronic food shortages with its people malnourished and at risk of dying from normally curable illnesses , senior red cross officials said tuesday.|
|Adv input seq||north detectives is apprehended its fourth winter of chronic food shortages with its people malnourished and at risk of dying from normally curable illnesses , senior red cross officials said tuesday.|
|Source output seq||north korea enters fourth winter of food shortages|
|Adv output seq||north police arrest fourth winter of food shortages.|
|Source input seq||after a day of fighting , congolese rebels said sunday they had entered kindu , the strategic town and airbase in eastern congo used by the government to halt their advances.|
|Adv input seq||after a day of fighting , nordic apprehended said sunday they had entered UNK , the strategic town and airbase in eastern congo used by the government to halt their advances.|
|Source output seq||congolese rebels say they have entered UNK.|
|Adv output seq||nordic police arrest ## in congo.|
|Source input seq||three weeks ago , when someone started stalking homeless people here , bending over them as they slept in the alleys and doorways and slashing their throats , a grim joke began making the rounds on the streets : the killer had been hired by the city.|
|Adv input seq||three weeks ago dissidents when someone started modi homeless people here thai bending over them as they slept in the alleys and doorways and slashing their throats , a grim joke began making the rounds on the streets : the killer had been hired by the city.|
|Source output seq||homeless people get a taste of life|
|Adv output seq||thai police arrest # in crackdown on protesters|
In this paper, we propose a novel framework, i.e., Seq2Sick, to generate adversarial examples for sequence-to-sequence neural network models. We propose a projected gradient method to address the issue of discrete input space, adopt group lasso to enforce the sparsity of the distortion, and develop a regularization technique to further improve the success rate. Besides, different from most existing algorithms that are designed for untargeted attack and classification tasks, our algorithm can perform the more challenging targeted keywords attack. Our experimental results show that the proposed framework is powerful and effective: it can achieve high success rates in both non-overlapping and targeted keywords attacks with relatively small distortions. At the same time, we recognize that, compared with the DNN models for image classifications that have already been evaluated elsewhere, seq2seq models are indeed more robust to adversarial attacks.
CJH, MC and HZ are partially supported by NSF IIS-1719097.
-  Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013.
-  Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
-  Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
-  Matthias Hein and Maksym Andriushchenko. Formal guarantees on the robustness of a classifier against adversarial manipulation. arXiv preprint arXiv:1705.08475, 2017.
-  Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, and Luca Daniel. Evaluating the robustness of neural networks: An extreme value theory approach. arXiv preprint arXiv:1801.10578, 2018.
-  Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 3104–3112, 2014.
-  Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
-  Minh-Thang Luong, Hieu Pham, and Christopher D Manning. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025, 2015.
-  Alexander M Rush, Sumit Chopra, and Jason Weston. A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685, 2015.
-  William Chan, Navdeep Jaitly, Quoc V. Le, and Oriol Vinyals. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016, pages 4960–4964, 2016.
-  Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In , pages 2574–2582, 2016.
Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay
Celik, and Ananthram Swami.
The limitations of deep learning in adversarial settings.In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (2016), pages 372–387. IEEE, 2016.
-  Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on, pages 39–57. IEEE, 2017.
-  Moustapha Cissé, Yossi Adi, Natalia Neverova, and Joseph Keshet. Houdini: Fooling deep structured visual and speech recognition models with adversarial examples. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 6980–6990, 2017.
-  Pin-Yu Chen, Yash Sharma, Huan Zhang, Jinfeng Yi, and Cho-Jui Hsieh. Ead: Elastic-net attacks to deep neural networks via adversarial examples. arXiv preprint arXiv:1709.04114, 2017.
Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh.
Zoo: Zeroth order optimization based black-box attacks to deep neural
networks without training substitute models.
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 15–26. ACM, 2017.
-  Jamie Hayes and George Danezis. Machine learning as an adversarial service: Learning black-box adversarial examples. arXiv preprint arXiv:1708.05207, 2017.
-  Nina Narodytska and Shiva Prasad Kasiviswanathan. Simple black-box adversarial perturbations for deep networks. arXiv preprint arXiv:1612.06299, 2016.
-  Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506–519. ACM, 2017.
-  Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248, 2017.
-  Jan Hendrik Metzen, Mummadi Chaithanya Kumar, Thomas Brox, and Volker Fischer. Universal adversarial perturbations against semantic image segmentation. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 2774–2783, 2017.
-  Anurag Arnab, Ondrej Miksik, and Philip H. S. Torr. On the robustness of semantic segmentation models to adversarial attacks. CoRR, abs/1711.09856, 2017.
-  Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan L. Yuille. Adversarial examples for semantic segmentation and object detection. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 1378–1387, 2017.
-  Jiajun Lu, Hussein Sibai, and Evan Fabry. Adversarial examples that fool detectors. CoRR, abs/1712.02494, 2017.
-  Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Dawn Song, Tadayoshi Kohno, Amir Rahmati, Atul Prakash, and Florian Tramèr. Note on attacking object detectors with adversarial stickers. CoRR, abs/1712.08062, 2017.
-  Hongge Chen, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, and Cho-Jui Hsieh. Show-and-fool: Crafting adversarial examples for neural image captioning. arXiv preprint arXiv:1712.02051, 2017.
-  Xiaojun Xu, Xinyun Chen, Chang Liu, Anna Rohrbach, Trevor Darell, and Dawn Song. Can you fool AI with adversarial examples on a visual turing test? CoRR, abs/1709.08693, 2017.
-  Ivan Evtimov, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, and Dawn Song. Robust physical-world attacks on machine learning models. CoRR, abs/1707.08945, 2017.
-  Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Identifying vulnerabilities in the machine learning model supply chain. CoRR, abs/1708.06733, 2017.
-  Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. CoRR, abs/1607.02533, 2016.
-  Marco Melis, Ambra Demontis, Battista Biggio, Gavin Brown, Giorgio Fumera, and Fabio Roli. Is deep learning safe for robot vision? adversarial examples against the icub humanoid. In 2017 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2017, Venice, Italy, October 22-29, 2017, pages 751–759, 2017.
-  Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adversarial examples. CoRR, abs/1707.07397, 2017.
Nicolas Papernot, Patrick McDaniel, Ananthram Swami, and Richard Harang.
Crafting adversarial input sequences for recurrent neural networks.In Military Communications Conference, MILCOM 2016-2016 IEEE, pages 49–54. IEEE, 2016.
-  Jiwei Li, Will Monroe, and Dan Jurafsky. Understanding neural networks through representation erasure. arXiv preprint arXiv:1612.08220, 2016.
-  Suranjana Samanta and Sameep Mehta. Towards crafting text adversarial samples. arXiv preprint arXiv:1707.02812, 2017.
-  Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, and Wenchang Shi. Deep text classification can be fooled. arXiv preprint arXiv:1704.08006, 2017.
-  Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Yanjun Qi. Black-box generation of adversarial text sequences to evade deep learning classifiers. arXiv preprint arXiv:1801.04354, 2018.
-  Robin Jia and Percy Liang. Adversarial examples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328, 2017.
-  Zhengli Zhao, Dheeru Dua, and Sameer Singh. Generating natural adversarial examples. arXiv preprint arXiv:1710.11342, 2017.
-  Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. Hotflip: White-box adversarial examples for nlp. arXiv preprint arXiv:1712.06751, 2017.
-  Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
-  Zhitao Gong, Wenlu Wang, Bo Li, Dawn Song, and Wei-Shinn Ku. Adversarial texts with gradient methods. arXiv preprint arXiv:1801.07175, 2018.
Jerome H Friedman.
On bias, variance, 0/1—loss, and the curse-of-dimensionality.Data mining and knowledge discovery, 1(1):55–77, 1997.
Jeffrey Pennington, Richard Socher, and Christopher Manning.
Glove: Global vectors for word representation.
Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
-  Thanh-Le Ha, Jan Niehues, and Alexander Waibel. Toward multilingual neural machine translation with universal encoder and decoder. arXiv preprint arXiv:1611.04798, 2016.
-  Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318. Association for Computational Linguistics, 2002.
Appendix A Appendix
a.1 More adversarial examples on target keyword method
In Table 11, we show more adversarial examples generated by our Seq2Sick framework using targeted keywords method.
|Source input seq||cambodia ’s two-party opposition asked the asian development bank monday to stop providing loans to the incumbent government , which it calls illegal|
|Adv input seq||cambodia ’s heavy-handed manhunts warrant the asian development bank monday to stop providing loans to the incumbent government, rwanda apec calls police|
|Source output seq||cambodian opposition asks adb to stop loans to government|
|Adv output seq||cambodia ’s police arrest asian development bank|
|Source input seq||europe ’s top official said friday that he hoped former chilean dictator augusto pinochet would be extradited to spain to be tried for crimes committed during his 17-year rule.|
|Adv input seq||detectives sweep top official said friday that he hoped former chilean dictator augusto pinochet would be extradited to spain to be tried for crimes committed during his UNK rule.|
|Source output seq||europe ’s top official hopes pinochet will be extradited to spain.|
|Adv output seq||chilean police arrest pinochet ’s extradition.|
|Source input seq||prime minister rafik hariri has declined an informal invitation from lebanon ’s new president to form the next government , sparking a political crisis in this country as it rebuilds from its devastating civil war .|
|Adv input seq||prime minister rafik hariri apprehended declined riot-torn informal invitation from lebanon ’s new president to form the next government , sparking a political crisis ex-police this country as it UNK from its devastating civil war .|
|Source output seq||lebanese prime minister declines to form new government|
|Adv output seq||lebanese police arrest ## in new government|
|Source input seq||the trials of three outspoken dissidents over , communist party leader jiang zemin signaled wednesday that china will sustain a crackdown on dissent throughout next year .|
|Adv input seq||firemen trials wanchope consists outspoken dissidents over , communist party leader jiang zemin signaled wednesday that china will sustain a crackdown on dissent throughout next year .|
|Source output seq||china ’s jiang signals crackdown on dissent|
|Adv output seq||chinese police say they will continue crackdown on dissent|
|Source input seq||president boris yeltsin stayed home tuesday , nursing a respiratory infection that forced him to cut short a foreign trip and revived concerns about his ability to govern.|
|Adv input seq||president boris yeltsin stayed home tuesday , cops cops respiratory infection that forced him to cut short a foreign trip and revived concerns about his ability to govern.|
|Source output seq||yeltsin stays home after illness|
|Adv output seq||yeltsin stays home after police arrest|
|Source input seq||a group of high-powered u.s. investors in southeast asia on monday applauded efforts to perk up thailand ’s staggering economy ,saying they had been assured by top thai officials that key economic reform packages will soon be approved|
|Adv input seq||a eds confiscate rafah arresting supporters in southeast asia on monday applauded efforts to perk up thailand ’s staggering economy , saying they had been assured by top thai officials that key economic reform packages will soon be approved .|
|Source output seq||us investors welcome thai economic reforms|
|Adv output seq||thai police arrest protesters in southeast asia|
a.2 More adversarial examples on non-overlapping method
In Table 12, we show more adversarial examples generated by our Seq2Sick framework using non-overlapping method. By changing only one word in the input, we can fool the seq2seq to generate totally different output.
|Source input seq||a group of high-powered u.s. investors in southeast asia on monday applauded efforts to perk up thailand ’s staggering economy , sayi ng they had been assured by top thai officials that key economic reform packages will soon be approved.|
|Adv input seq||a group of high-powered u.s. investment in southeast asia on monday applauded efforts to perk up thailand ’s staggering economy , saying they had been assured by top thai officials that key economic reform packages will soon be approved.|
|Source output seq||us investors welcome thai economic reforms.|
|Adv output seq||u.s. investment in southeast asia to be boosted.|
|Source input seq||just four years ago , it was a good bet that newt gingrich would be the pivotal figure in u.s. politics at the turn of the millennium.|
|Adv input seq||just four years ago , it was a good bet that newt lawmakers would be the pivotal figure in u.s. politics at the turn of the millennium .|
|Source output seq||gingrich ’s UNK vision is a good betmilosevic orders army back to barracks|
|Adv output seq||a new year for the millennium|
|Source input seq||russia ’s new prime minister picked an unusual way to reassure the nation thursday , after two weeks of deliberations he announced that he still had no plan to rescue the country from its economic crisis .|
|Adv input seq||russia ’s new prime minister picked an unusual way to peacocks the nation thursday , after two weeks of deliberations he announced that he still had no plan to rescue the country from its economic crisis.|
|Source output seq||russia s new prime minister says he has no plan to rescue country from economic crisis|
|Adv output seq||new russian prime minister picks peacocks|
|Source input seq||as turkey kept up warlike rhetoric against damascus , egypt on sunday began shuttle democracy between the two neighbors to avoid a military confrontation over turkish kurdish bases in syria.|
|Adv input seq||as turkey kept up warlike rhetoric against damascus of egypt on sunday began shuttle democracy between the two neighbors to avoid a military confrontation over turkish kurdish bases in syria.|
|Source output seq||egypt begins shuttle diplomacy with syria|
|Adv output seq||turkey holds up its rhetoric against syria|
|Source input seq||indian prime minister p.v. narasimha rao ’s promise of more autonomy for troubled kashmir and his plea for early state elections has sparked a violent reaction from provincial moslem and opposition parties.|
|Adv input seq||indian prime minister p.v. narasimha rao honeywell promise of more autonomy for troubled kashmir and his plea for early state elections has sparked a violent reaction from provincial moslem and opposition parties.|
|Source output seq||russia s new prime minister says he has no plan to rescue country from economic crisis|
|Adv output seq||new autonomy for troubled kashmir indian pm ’s promise of autonomy for kashmir sparks violent reaction|
|Source input seq||
the moment of truth for chuck knoblauch came in the bottom of the first inning when his name was announced at yankee stadium for the first time since he neglected to chase down that memorable loose ball last wednesday.
|Adv input seq||the moment of truth for chuck yankees came in the bottom of the first inning when his name was announced at yankee stadium for the first time since he neglected to chase down that memorable loose ball last wednesday.|
|Source output seq||knoblauch ’s return is a UNK|
|Adv output seq||yankees yankees lose yankees #-#|