DeepAI
Log In Sign Up

AdvCodeMix: Adversarial Attack on Code-Mixed Data

Research on adversarial attacks are becoming widely popular in the recent years. One of the unexplored areas where prior research is lacking is the effect of adversarial attacks on code-mixed data. Therefore, in the present work, we have explained the first generalized framework on text perturbation to attack code-mixed classification models in a black-box setting. We rely on various perturbation techniques that preserve the semantic structures of the sentences and also obscure the attacks from the perception of a human user. The present methodology leverages the importance of a token to decide where to attack by employing various perturbation strategies. We test our strategies on various sentiment classification models trained on Bengali-English and Hindi-English code-mixed datasets, and reduce their F1-scores by nearly 51 and 53 tokens are perturbed in a given sentence.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

07/04/2019

Adversarial Attacks in Sound Event Classification

Adversarial attacks refer to a set of methods that perturb the input to ...
11/15/2022

Universal Distributional Decision-based Black-box Adversarial Attack with Reinforcement Learning

The vulnerability of the high-performance machine learning models implie...
05/31/2022

CodeAttack: Code-based Adversarial Attacks for Pre-Trained Programming Language Models

Pre-trained programming language (PL) models (such as CodeT5, CodeBERT, ...
03/18/2022

Neural Predictor for Black-Box Adversarial Attacks on Speech Recognition

Recent works have revealed the vulnerability of automatic speech recogni...
09/06/2019

Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification

Adversarial attacks against machine learning models have threatened vari...
05/31/2018

Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

We present a probabilistic framework for studying adversarial attacks on...
03/01/2021

Token-Modification Adversarial Attacks for Natural Language Processing: A Survey

There are now many adversarial attacks for natural language processing s...

1. Introduction

In the age of globalization, code-mixed text inputs are a common phenomenon since it is quite natural for bilinguals to switch back and forth between two languages while communicating, both in verbal and textual forms (Sridhar and Sridhar, 1980). Such textual instances are challenging to process as they often combine grammatical bases of both the languages in a single sentence, make use of colloquial terms and short-forms extensively, and standardized rules aren’t followed while transliterating when the pair of languages have different scripts. We introduce a three-step attack strategy that can be used for generating adversarial examples using minimal resources for any type of code-mixed data (with and without transliteration). We have used our framework to evaluate the success of adversarial attacks on a few sentiment classification models (Patwa et al., 2020; Zhang et al., 2015; Pires et al., 2019) that have been diagnosed effective on code-mixed data. Research on adversarial techniques has become an important aspect, especially for security-critical applications, as it helps us in both analyzing the fallacies of the models, and make them more robust. Some of the popular methods towards building robust pipelines include adversarial training (Madry et al., 2017) of the model and rejection of adversarial inputs (Meng and Chen, 2017).
In our approach, we do not make any replacements in the original sentence based on word synonyms as observed in previous adversarial attack approaches (Jin et al. (2020), Li et al. (2020)), or any such triggers at the word or sentence level (Sun, 2020), which ensures that the semantic similarity is automatically preserved. We only make phonetic perturbations at the sub-word and word-level, and replace words with their corresponding transliterated counterpart in the code-mixed sentence, as it can deceive the model in several cases despite keeping the overall structure intact. The main contributions of our paper are (a) preparation of a generalized model agnostic framework for generating adversarial examples of code-mixed data, (b) proposed novel language specific perturbation techniques that preserve the semantic structure of a sentence, (c) achieve a successful attack in a short time span per sentence.

2. Related Work

Sun et al. (Sun, 2020) had systematically studied backdoor attacks on text data. They have shown the impact that different trigger mechanisms like sentence, character and word level triggers, non-natural triggers and special triggers can have on the attack framework. Jin et al. (Jin et al., 2020)

had proposed the TEXTFOOLER framework that can generate adversarial text for binary text classification and text entailment tasks and have achieved state-of-the-art results on powerful models like pre-trained BERT, convolutional and recurrent neural networks. A novel and interesting approach, BERT-Attack, was also proposed 

(Li et al., 2020) where the pre-trained BERT model was used to effectively attack fine-tuned BERT models, as well as traditional LSTM based deep learning models. Liu et al. (Liu et al., 2020)

have applied the concept of transfer learning using the ERNIE framework, along with adversarial training using a multilingual model, to their work. Tan et al. 

(Tan and Joty, 2021) have proposed two strong black-box adversarial attack frameworks-one word-level and another phrase-level, the latter one being being particularly effective on XNLI. Ren et al. (Ren et al., 2019)

had proposed a novel greedy algorithm, the probability weighted word saliency (PWWS), that is based on a synonym substitution strategy. Li et al. 

(Li et al., 2018) had proposed the TEXTBUGGER framework that outperforms current state-of-the-art adversarial attack frameworks in terms of attack success rate. Gao et al. (Gao et al., 2018) had proposed a framework to generate adversarial text in a black-box setting - DeepWordBug which can effectively generate small perturbations in the most critical tokens based on novel scoring strategies for adversarial attack.

3. Problem and Challenges

In this problem, we have a set of sentences , , …, } with an associated set of N labels , , …, } where total number of classes is . For a given pre-trained model, needs to do a mapping from an input sample, to a ground truth label, . Now, for an input sentence, , a valid adversarial example, must meet the following criteria.

(1)

where is a semantic and syntactic similarity function and is the minimum similarity between the input and adversarial samples. Here, where is an imperceptible perturbation added into input sentence, . The main challenge has been to come up with novel perturbation techniques for code-mixed data. Previous frameworks for monolingual data, language model can not be used to perform perturbations as such a model is currently not available for code-mixed data. In code-mixed data domain, synonym-based replacement strategy will not work as it is very difficult to derive synonyms from bilingual tokens and also their synonyms may have the undesired effect of changing the contextual meaning. Also, simply perturbing characters in a code-mixed token renders it meaningless in most of the cases, as the meaning as well as the semantic structure of a word might be disturbed. Hence, we had to take into account phonetic similarity in order to make token perturbations. We also had to identify language tags of each token correctly in order to perturb them into its complementary language.

4. Methodology

In our proposed framework, we have developed a mechanism to attack text classification models under a black-box setting. The goal of our framework is to identify the most important tokens , , …, and apply a set of perturbation techniques, P = {, , …, } on them iteratively until we get for the corresponding input sentence if attack is successful. More details of these different modules are given in following sub-sections.

4.1. Token Importance Calculation

We consider a sample sentence of n tokens, = {, ,…,

}. In order to calculate the importance of a particular token in a sentence, we replace that particular token by an UNKNOWN token, and obtain the prediction vector for the modified sentence. Each token is assigned a score

based on its impact on the sentence using a token importance calculation algorithm, and we select the top k tokens as the set of most-important tokens () that can be attacked. The scoring approach is undertaken so that the number of perturbations to the original sentence can be minimized. We calculate the token importance, , using the equation below:

where, denotes the model prediction considering all the words in the original sentence, and, denotes the model prediction with token removed. is the original label class, is the predicted class, and denotes the probability value of the label class index in the model prediction.

4.2. Perturbation Techniques

Once we have our , from , we select tokens from this list in descending order of their importance in an iterative manner. Thereafter, we use a variety of perturbation techniques to alter this word in such a manner that it can adhere to the surrounding context and there is no significant change in the semantic structure of the sentence when this word is replaced. At the same time, we need to ensure that the perturbed word looks very much similar to a human-error and has a strong potential to force the target model to make a wrong prediction. We have used 3 perturbation techniques in our algorithm, in the order: (a) sub-word perturbation, (b) character-repetition, and (c) switch-word language. In order to identify a sub-word or character that can be perturbed within a token, we need to identify the language id of the token. We have different dictionaries defined for different languages, hence, language identification using the language id is really important so that we can load up the corresponding dictionary and replace the sub-word or character. We have used a character and phonetic based LSTM model (Mandal et al., 2018a) to obtain language ids for various tokens present in a sentence.

Sub-Word Perturbation : We have used a pre-existing dictionary of character groups that can be replaced by phonetically similar characters (Mandal et al., 2018a). Essentially, these groups consists of character uni, bi and trigrams which are phonetically similar and are inter-changeably used in social media based on user backgrounds (e.g. pha and f, au and ow). Whenever such a character-group is present in any particular word in the given sentence, we replace it with its corresponding value(s) from the dictionary. For example, in Bengali, word ’bhalo’→’valo’ (meaning good) and in Hindi, word ’gajab’→’gazb’ (meaning surprising). The sub-word perturbation technique ensures that both the meaning as well as the semantic structure is preserved.

Character-Repetition Perturbation : We also observed that character repetition was popular on social media, often to emphasize on something or for humour. Thus, we exploited this property and created a dictionary of top characters which are frequently repeated. We select a character from the target word and repeat it once based on its value in the dictionary. Repeating certain characters do not alter the meaning of a word and also preserves its phonetic similarity, however, it might force a model to make a false prediction. For example, in Hindi, word ’mafi’→’mafii’ (meaning pardon) and in Bengali, word ’paoa’→’paooa’ (meaning getting).

Switch-Word Language Perturbation : Given an input sentence, we have used a character and phonetic based LSTM model (Mandal et al., 2018a) to obtain language ids for various tokens present in it. Once we have the language id of a token, we back-transliterate and translate the word to its complementary language using contextual information and an LSTM-based seq2seq model; for example, ’bacha’→’baby’ and in Hindi, ’byaah’→’wedding’. This perturbation technique does not change the meaning of the word and also preserves contextual similarity to a great extent, however, it can force the model to make a false prediction.

4.3. Iterative Inference

In this step, we iteratively choose the next most-important word obtained using the token importance calculation algorithm, perturb it using the attack strategies one by one with subword-perturbation being used in the first trial, followed by character-repetition and switch-word language perturbations. The intuition behind the order of application of the above techniques is that the sub-word and letter-repetition perturbation techniques make changes only to a part of a word which tends to preserve its semantic structure the most. We obtain the model prediction vector, and the predicted class, , by replacing each of these perturbed words in the original sentence. As soon as our prediction label becomes different from the original prediction of the unperturbed sentence, we claim that the attack is successful and terminate the process. If the system is unable to induce an attack even after trying out all possible perturbation techniques, we declare that an attack for the given sentence is unsuccessful. We also calculate the value of the maximum probability drop, , that has been induced in the label class. We compute the label class probability value from the prediction vector of the adversarial sentence, that has produced the maximum drop among all perturbation techniques, for the current token, , and the label class probability value, of the prediction vector obtained using the current perturbation function, and compute their maximum difference, . is the perturbed token that produces the maximum probability drop in the label class of the prediction vector from , obtained using . The details of our attack framework is explained in Algorithm 1.

Input: Sentence , Set of token perturbation functions , the corresponding ground truth label , target model , maximum number of words to perturb .
Output: Attack Success Flag, Predicted Label after attack, Probability Drop

1:Obtain model prediction vector, , with
2:Calculate
3:Initialization:
4:for each word in  do
5:     Calculate the importance score of using Eq. (4.1)
6:end for
7:Obtain a final set of words T sorted by word importance scores .
8:Select the top most important words using from as
9:for each word in  do
10:     Initialization:
11:     Initialization:
12:     for each perturbation technique from  do
13:          Generate perturbed word of , using
14:           with replaced by
15:          Obtain current model prediction with ,
16:          Calculate
17:          
18:          if  then
19:               return True, ,
20:          else
21:               
22:               if  then
23:                    
24:                    
25:               end if
26:          end if
27:     end for
28:      with replaced by
29:     
30:end for
31:return False, ,
Algorithm 1 CodeMixed Adversarial Attack
Model Before Attack After Attack
Top 2 Words Top 4 Words Top 8 Words
F1 F1 Time(s) MOS F1 Time(s) MOS F1 Time(s) MOS
Bi-LSTM-CNN 0.8800 0.5141 0.6504 0.1250 0.4678 0.3100 0.6641 0.1250 0.63 0.2851 0.8267 0.4286 0.6901
Bi-GRU-CNN 0.9046 0.5416 0.5922 0.2083 0.4436 0.3722 0.7015 0.2500 0.6129 0.3024 0.7835 0.3496 0.687
Transformer 0.8736 0.5579 0.3168 0.1250 0.4997 0.4500 0.3605 0.2917 0.5335 0.3811 0.4096 0.8750 0.601
char-CNN 0.8708 0.5441 0.5486 0.3750 0.4414 0.3948 0.6072 0.4150 0.5836 0.3338 0.7144 0.5000 0.6543
mBERT 0.8921 0.7197 0.7984 0.2083 0.2673 0.5809 0.9766 0.9167 0.3974 0.4820 1.9569 1.0417 0.4954
Table 1. Adversarial Attack Results On Different Models - Hindi-English Code-Mixed Data
Model Before Attack After Attack
Top 2 Words Top 4 Words Top 8 Words
F1 F1 Time(s) MOS F1 Time(s) MOS F1 Time(s) MOS
Bi-LSTM-CNN 0.8966 0.7147 0.5052 0.2083 0.2778 0.4828 0.5938 0.2917 0.5122 0.3296 0.6967 0.4096 0.667
Bi-GRU-CNN 0.8927 0.7255 0.5374 0.0833 0.263 0.5078 0.6410 0.4150 0.4836 0.3496 0.7628 0.7067 0.6819
Transformer 0.8984 0.6852 0.3643 0.3750 0.3001 0.4726 0.4473 0.7083 0.5037 0.3149 0.5240 0.8333 0.6532
char-CNN 0.8600 0.6185 0.4217 0.2917 0.4401 0.4957 0.4664 0.3750 0.4889 0.4191 0.5335 0.5000 0.562
mBERT 0.9132 0.8365 0.6843 0.1428 0.158 0.7008 0.8821 0.4286 0.2768 0.5155 1.1967 0.5714 0.4369
Table 2. Adversarial Attack Results On Different Models - Bengali-English Code-Mixed Data

5. Experiments and Results

We have used a sentiment classification task to demonstrate the capability of our framework. Initially, we had trained deep learning models on the given code-mixed sentiment classification datasets and evaluated their performance on the validation and test sets. We have used the same models to perform inference on the adversarial samples. For models, we have taken into account several of the state-of-the-art models that have been used for sentiment classification over the years and have decided to finalize Bi-LSTM-CNN (Jamatia et al., 2020), Bi-GRU-CNN (Jamatia et al., 2020), Transformer (Palomino and Ochoa-Luna, 2020), char-CNN (Zhang et al., 2015) and mBERT (Pires et al., 2019) based architectures for demonstration of the model agnostic nature of our adversarial attack technique. The maximum input sequence length, vocabulary size, learning rate for these experiments were set at 25, 17k, and 0.001 respectively. We have summarized our results using F1-score, mean attack time per sentence, Mean Opinion Score (MOS) (Streijl et al., 2016), and Adversarial Attack Success Rate () which is, the proportion of test data points on which adversarial attack has been successful, in Tables 1, 2.

Evaluation Metric : Here, we used Mean Opinion Score (MOS) (Streijl et al., 2016)

to evaluate our system. Cosine similarity is not applicable in this case as there is no pre-existing model to obtain the embeddings of code-mixed sentences. We were supported by a group of volunteers who had been provided a list of 100 questions each, for each one of the models. The MOS for each model is calculated by taking the average MOS given by different human participants involved in the study. Each participant was asked to enter a score in the range

for each perturbed sentence among a list of perturbed sentences produced by that particular model, for a different configuration of the number of perturbed words; a indicates maximum similarity to the original sentence, while a indicates the least.

Data Sets Details : We have used two code-mixed sentiment classification datasets on two different language pairs, “Bn-En(Mandal et al., 2018b) and “Hin-En(Patra et al., 2018) for our experiments to evaluate the effectiveness of our method. The “Bn-En” dataset contains 3206 and 943 samples in the training and test sets, while the “Hin-En” dataset contains 13845 and 1846 samples in training and test sets, respectively. The average length of sentences in “Bn-En”, “Hin-En” are 15, 12, respectively, and the mean Code-Mixing Index (CMI) (Das and Gambäck, 2014) for “Bn-En” and “Hin-En” datasets are 22.1, 18.57 respectively.

Attacking Results and Ablation Study : From the results shown in Table 1, we can observe that the attack framework has been successful in forcing the models to make wrong predictions. Our approach is faster than other conventional approaches as those frameworks use another deep learning model or token similarity algorithm to bring about a successful perturbation. However, we use a hashing-based approach to partially or completely perturb tokens, which speeds up the entire operation to a huge extent. mBERT and char-CNN have proven to be slightly more robust to adversarial attack than rest of the other models. In case of char-CNN, it can be attributed to the fact that there is no issue with characters being out-of-vocabulary; also, the words are represented by capturing information at the character level and the perturbation techniques affect only a certain fraction of characters of a word. Thus, there is a lesser scope of information loss in the token embedding vectors. The resistance of the mBERT model to adversarial attack can be justified on the grounds that it is pre-trained on a huge corpus of multilingual data, and the tokenization is done at the sub-word level, hence, perturbations that do not significantly alter the semantic structure and meaning of a sentence are robust to an attack. Table 2 shows that the attack framework was successful in adversarial attacks on the given models, with mBERT and char-CNN remaining significantly more resilient to adversarial attacks than the other models. The mBERT model remains the most robust model in this case as well.

We have also performed an experiment as ablation study to estimate the effectiveness of each of the 3 perturbation techniques by enumerating the corresponding perturbation success rate (

), which is the percentage of the vocabulary words that could be successfully perturbed using the given technique. From Table 3, we infer that the letter-repetition technique is the most successful in Bengali-English data due to a possibility of fewer changes in the word semantic structure, and sub-word perturbation turns out to be the most effective in the Hindi-English data, which can be attributed to a greater sense of the semantic structure of the code-mixed tokens to changes at the sub-word level.

Dataset Perturbation Methods
Bn-En Sub-Word 85.9
Character-Repetition 93.08
Switch-Word Language 79.94
Hin-En Sub-Word 88.5
Character-Repetition 90.52
Switch-Word Language 87
Table 3. Code-mixed Perturbation Performance Study

6. Conclusion

In this paper, we have presented a generic framework that can attack code-mixed classification models by identifying and perturbing important tokens. Our word-importance calculation algorithm ensures that an attack is successful with a very low percentage of word perturbations in the original sentence, and the entire process is completed within a very short duration. Also, the low values of MOS indicate that the perturbed sentences are very similar to the original ones. We have been able to reduce the F1-scores of both Bengali-English and Hindi-English code-mixed datasets, which shows that our attack framework can be successful with a variety of language pair.

References

  • A. Das and B. Gambäck (2014) Identifying languages at the word level in code-mixed Indian social media text. In Proceedings of the 11th International Conference on Natural Language Processing, Goa, India, pp. 378–387. External Links: Link Cited by: §5.
  • J. Gao, J. Lanchantin, M. L. Soffa, and Y. Qi (2018)

    Black-box generation of adversarial text sequences to evade deep learning classifiers

    .
    In 2018 IEEE Security and Privacy Workshops (SPW), pp. 50–56. Cited by: §2.
  • A. Jamatia, S. Swamy, B. Gambäck, A. Das, and S. Debbarma (2020)

    Deep learning based sentiment analysis in a code-mixed english-hindi and english-bengali social media corpus

    .
    International journal on artificial intelligence tools 29 (5). Cited by: §5.
  • D. Jin, Z. Jin, J. T. Zhou, and P. Szolovits (2020) Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34, pp. 8018–8025. Cited by: §1, §2.
  • J. Li, S. Ji, T. Du, B. Li, and T. Wang (2018) Textbugger: generating adversarial text against real-world applications. arXiv preprint arXiv:1812.05271. Cited by: §2.
  • L. Li, R. Ma, Q. Guo, X. Xue, and X. Qiu (2020) Bert-attack: adversarial attack against bert using bert. arXiv preprint arXiv:2004.09984. Cited by: §1, §2.
  • J. Liu, X. Chen, S. Feng, S. Wang, X. Ouyang, Y. Sun, Z. Huang, and W. Su (2020) Kk2018 at semeval-2020 task 9: adversarial training for code-mixing sentiment classification. arXiv preprint arXiv:2009.03673. Cited by: §2.
  • A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. Cited by: §1.
  • S. Mandal, S. D. Das, and D. Das (2018a) Language identification of bengali-english code-mixed data using character & phonetic based lstm models. arXiv preprint arXiv:1803.03859. Cited by: §4.2, §4.2, §4.2.
  • S. Mandal, S. K. Mahata, and D. Das (2018b) Preparing bengali-english code-mixed corpus for sentiment analysis of indian languages. arXiv preprint arXiv:1803.04000. Cited by: §5.
  • D. Meng and H. Chen (2017) Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp. 135–147. Cited by: §1.
  • D. Palomino and J. Ochoa-Luna (2020) Palomino-ochoa at semeval-2020 task 9: robust system based on transformer for code-mixed sentiment classification. External Links: 2011.09448 Cited by: §5.
  • B. G. Patra, D. Das, and A. Das (2018) Sentiment analysis of code-mixed indian languages: an overview of sail_code-mixed shared task@ icon-2017. arXiv preprint arXiv:1803.06745. Cited by: §5.
  • P. Patwa, G. Aguilar, S. Kar, S. Pandey, S. PYKL, B. Gambäck, T. Chakraborty, T. Solorio, and A. Das (2020) Semeval-2020 task 9: overview of sentiment analysis of code-mixed tweets. arXiv e-prints, pp. arXiv–2008. Cited by: §1.
  • T. Pires, E. Schlinger, and D. Garrette (2019) How multilingual is multilingual bert?. arXiv preprint arXiv:1906.01502. Cited by: §1, §5.
  • S. Ren, Y. Deng, K. He, and W. Che (2019) Generating natural language adversarial examples through probability weighted word saliency. In Proceedings of the 57th annual meeting of the association for computational linguistics, pp. 1085–1097. Cited by: §2.
  • S. N. Sridhar and K. K. Sridhar (1980) The syntax and psycholinguistics of bilingual code mixing.. Canadian Journal of Psychology/Revue canadienne de psychologie 34 (4), pp. 407. Cited by: §1.
  • R. C. Streijl, S. Winkler, and D. S. Hands (2016) Mean opinion score (mos) revisited: methods and applications, limitations and alternatives. Multimedia Systems 22 (2), pp. 213–227. Cited by: §5, §5.
  • L. Sun (2020) Natural backdoor attack on text data. arXiv preprint arXiv:2006.16176. Cited by: §1, §2.
  • S. Tan and S. Joty (2021) Code-mixing on sesame street: dawn of the adversarial polyglots. arXiv preprint arXiv:2103.09593. Cited by: §2.
  • X. Zhang, J. Zhao, and Y. LeCun (2015) Character-level convolutional networks for text classification. arXiv preprint arXiv:1509.01626. Cited by: §1, §5.

Appendix A Appendix

Appendix B Qualitative Results

In Tables 5 and 4, we show the qualitative results of the effect of our perturbation algorithm on both the Bengali-English and Hindi-English code-mixed datasets. We observe that the perturbations that have been introduced in the sentences are negligible when the number of perturbed tokens are lesser, which can be attributed to an unintentional human error or a typing error. However, as the number of tokens perturbed increases, the attack success rate increases while the quality of sentences degrade. The perturbations do not alter the overall meaning of a sentence, neither does it effect the semantic structure. Hence, our perturbation algorithm can attack deep learning models in a very subtle manner.

Appendix C Error Analysis

We observe that it is relatively easy to produce an adversarial attack in the case of shorter sentences. It can be partially explained by the fact that in many cases, a particular word might have a much higher contribution to the overall context of a short sentence compared to a longer sentence. Hence, we are easily able to switch the predicted label for a shorter sentence after perturbing less than four tokens only, while we might need to perturb a larger number of tokens in the case of a longer sentence. Also, some perturbations like the letter repetition perturbation might add a larger number of redundant characters to a word which might be quite conspicuous, however, they do not alter the meaning of the sentence and is unable to ”fool” the model because of its phonetic similarity. We have provided some unsuccessful adversarial samples on mBERT model for both Bengali-English and Hindi-English datasets in Tables 6 and 7.

Input Text : pick up da cal damn dumb gal he neds u dhaniya pyar krti h
tu use bat kr use mar rha hoga wo andar se
Model
Perturbation
Level
(K)
Examples
Ground
Truth
Predicted
Label
Perturbed
Label
MOS
Bi-LSTM CNN 2
piiik up da cal damn dumb gal he neds u dhaniya pyar krti h
tu use bat kr use mar rha hoga wo andar se
neutral neutral neutral 0.125
4
pick up da call damn dumb gal he neds u dhaniya pyar krti h
 tu use bat kr use mar rha hoga wo andar se
neutral 0
8
pick up da cal damn dumb gal he neds u dhaniya pyar krti h
tu use bat kr use mar rha haaga wo within se
neutral 0.125
Bi-GRU CNN 2
pick up da cal damn dumb gaal he neds u dhaniya pyar krti h
tu use talk kr use mar rha hoga wo andar se
neutral neutral neutral 0.125
4
pick up da cal damn dumb gaal he neds u dhniya pyar krti h
tu use talk kr use mar rha hoga wo andar se
negative 0.125
8
pick up da cal damn dumb glll he neds u dhniya pyar krti h
tu use talk kr use mar rha hoga wo andar se
negative 0.150
charCNN 2
pick up da cal damn dumb gal he neees u dhaniya pyar krti h
tu use bat kr use mar rha hoga wo andr se
neutral neutral negative 0.125
4
pick up da cal dmn dumb gal he neds u dhaniya pyar krti h
tu use bat kr use mar rha hoga wo andr se
neutral 0.125
8
pick up daa clll dmn dumb gal he neds u dhaniya pyar krti h
tu use bat kr use mar rha hooga wo andaarrrr se
negative 0.250
Transformer 2
pick up da cal damn dumb ladki he neds pyar krti
tu use bat kr use mar rha hoga wo andar se
neutral neutral neutral 0.125
4
pick up da cal damn duumb ladki he neds pyar krti
tu use bat kr use mar rha hoga wo andar se
positive 0.125
8
pick up da cal damn goonga ladki he neds u dhaniya pyar krti h
tu use bat kr use mar rha hoga wo andar se
positive 0.125
mBERT 2
pick up da cal damn dumb gal hee neds dhaniya pyaarr krti
tu use bat kr use mar rha hoga wo andar se
neutral neutral neutral 0.125
4
pick up da cal damn dumb gal hee neds dhaniya pyaarr krti
tu use bat kr use mar rha hoga wo andar se
neutral 0.125
8
pick up da cal damn dumb gal Hee nedee dhaniya paaarr krtii
tu useeee bat kr useeee mar rha hoga wo insiderr se
negative 0.375
Table 4. Hindi-English Adversarial Attack Samples
Input Text :  vai eita hobe na vai tmio jao vai plz vai vipode pore jabo
Model
Perturbation
Level
(K)
Examples
Ground
Truth
Predicted
Label
Perturbed
Label
MOS
Bi-LSTM CNN 2 vai eeita hobe na vai tmio jao vai plz vai vipoode pore jabo negative negative neutral 0.125
4 vai eeita hobe na vai tmio jao vai plz vai vipoode pore jabo neutral 0.125
8 vai eeeta hobe na vai tmio jao vai plz vai veepode pore jabo neutral 0.150
Bi-GRU CNN 2 vai eeeta hobe na vai youtoo jao vai plz vai vipode pore jabo negative negative negative 0.150
4 vai eeeta hobe na vai tmeeo jao vai plz vai veepode pore jabo neutral 0.150
8 vai eit hobe na vai tmeeo jao vai plz vai vipoode pore jabo neutral 0.150
charCNN 2 voi eita hobe na voi tmio jao voi pllz voi vipode pore jbo negative negative neutral 0.125
4 voi eita hobe na voi tmio jao voi pllz voi vipode pore jbo neutral 0.125
8 vaee eita hobe na vaee tmio jao vaee plzz vaee vipode pore jaaboo neutral 0.125
Transformer 2 vai eeeta hobe na vai tmio jao vai plz vai vipoode pore jabo negative negative neutral 0.125
4 vai eit hobe na vai tmio jao vai plz vai vipoode pore jabo neutral 0.125
8 vai eeita hobe na vai tmio jao vai plz vai vipoode pore jabo neutral 0.125
mBERT 2 vai eeittt hobe na vai tmio jao vai plz vai vipoode pore jabo negative negative negative 0.150
4 vai eeeta hobe na vai tmio jao vai plz vai vipoode pore jabo negative 0.125
8 vai eit hobe n vai tmio jaaao vai pzzz vai vipodeeee p0re xbo positive 0.200
Table 5. Bengali-English Adversarial Attack Samples
Input Text : Denmark er movie Festen conventional film theke
ekdom alada Eta bohu festival prize peyechilo
Perturbation
Level
(k)
Examples Ground Truth Predicted Label Perturbed Label MOS
2
Denmark err movie Festen conventional film theke
ekdom alda Eta bohu festival prize peyechilo
negative negative negative 0.125
4
Denmark err movie Festen conventional film theke
ekdom alad Eta bohu festival prize peyechilo
negative 0.125
8
Denmark errr muvie Festen conventttonal film theke
ekdom alad Eaaa bohw festival prije piiyechiloo
negative 3.125
Table 6. Bengali-English Adversarial Attack Error Samples
Input Text : Hlo salman sir mai aur mere mami papa apke bhut bde fan h
Perturbation
 Level
(k)
Examples Ground Truth Predicted Label Perturbed Label MOS
2 Hlo salman sir mai aur mere mami ppa apke bhut bde fan h neutral neutral neutral 0.125
4 Hlo salman sir mai aur mere mami ppa apke bhut bigee fan h neutral neutral neutral 0.125
8 Hlo salman sir mai aur mere mami ppa apke bhut bigee fnnn h neutral neutral neutral 0.675
Table 7. Hindi-English Adversarial Attack Error Samples