There are many ways to express our opinions. When we exchange views online, we do not always immediately measure the emotional impact of our message. Even when the opinions expressed are legitimate, well-intentioned and constructive, a poor phrasing may make the conversation go awry Zhang2018
. Recently, Natural Language Processing (NLP) research has tackled the problem of abusive language detection by developing accurate classification models that flag toxic (or abusive, offensive, hateful) comments(davidson2017automated; Pavlopoulos2017b; Wulczyn2017; Gamback2017; fortuna2018survey; Zhang2018; 10.1371/journal.pone.0203794; Zampieri2019).
Input offensive comment
|you now have to defend this clown along with his russian corruption.|
|Generated civil comment||you now have to defend this guy from his russian ties……..|
Input offensive comment
|blaming trudeau and the government is just stupid.|
|Generated civil comment||blaming trudeau and the liberal government is just wrong.|
Input offensive comment
|dubya222A nickname for George W. Bush.was a moron.|
|Generated civil comment||dubya was a republican.|
The prospect of healthier conversations, nudged by Machine Learning (ML) systems, motivates the development of Natural Language Understanding and Generation (NLU and NLG) models that could later be integrated in a system suggesting alternatives to vituperative comments before they are posted. A first approach would be to train a text-to-text model (bahdanau2014neural; vaswani2017attention) on a corpus of parallel comments where each offensive comment has a courteous and fluent rephrasing written by a human annotator. However, such a solution requires a large paired labeled dataset, in practice difficult and expensive to collect (see Section 4.5).
Consequently, we limit our setting to the unsupervised case where the comments are only annotated in attributes related to toxicity, such as the Civil Comments dataset (DBLP:journals/corr/abs-1903-04561). We summarize our investigations with the following research question:
rq: Can we fine-tune end-to-end a pre-trained text-to-text transformer to suggest civil rephrasings of rude comments using a dataset solely annotated in toxicity?
Answering this question might provide researchers with an engineering proof-of-concept that would enable further exploration of the many complex questions that arise from such a tool being used in conversations. The main contributions of this work are the following:
We addressed for the second time the task of unsupervised civil rephrases of toxic texts, relying for the first time on the Civil Comments dataset, and achieving results that reflect the effectiveness of our model over baselines.
We developed a non-task specific approach (i.e. with no human hand-crafting in its design) that can be generalized and later applied to related and/or unexplored attribute transfer tasks.
While several of the ideas we combine in our model have been studied independently, to the best of our knowledge, no existing unsupervised models combine sequence-to-sequence bi-transformers, transfer learning from large pre-trained models, and self-supervised fine-tuning (denoising auto-encoder and cycle consistency). We discuss the related work introducing these tools and techniques in the following section.
2 Related work
Unsupervised complex text attribute transfer (like civil rephrasing of toxic comments) remains in its early stages, and our particular applied task has only a single antecedent (nogueira-dos-santos-etal-2018-fighting). There is a great variety of useful works to tackle the task and this section attempts to summarize the vast majority of these works. We describe below the recent strategies (such as attention mechanisms bahdanau2014neural) that led to significant progress in supervised NLU and NLG tasks. Then, we present the most related lines of work in unsupervised text-to-text tasks.
2.1 Transformers333To avoid confusion we denote as bi-transformer the original encoder-decoder transformer whereas encoder-only and decoder-only models are called uni-transformers here. are state-of-the-art architectures in NLP
showed that transformer architectures, based on attention mechanisms, achieved state-of-the-art results when applied to supervised Neural Machine Translation (NMT). More generally, transformers have proven capable in various NLP and speech tasks(8462506; huang2018music; le2019flaubert; li2019neural). Moreover, transformers benefit from pre-training before being fine-tuned on downstream tasks (Devlin2018; dai2019transformer; yang2019xlnet; conneau2019cross; raffel2019exploring). Subsequent research has adopted uni-transformers in many supervised classification and regression tasks (Devlin2018) and in unsupervised language modeling (radford2019language; keskarCTRL2019; Dathathri2020Plug), until raffel2019exploring proposed a unified pre-trained bi-transformer applicable to any text classification, text regression and text-to-text task. Further, recent works tackle the language detoxification of unconditional language models (KrauseGeDi2020; gehman2020realtoxicityprompts).
2.2 Unsupervised losses enable training text-to-text models end-to-end
After the success of unsupervised image-to-image style transfer in computer vision (CV), some approaches have addressed unsupervised text-to-text tasks. Unsupervised Neural Machine Translation (UNMT) is maybe the most promising of them.artetxe2018unsupervised; conneau2017word; lample2017unsupervised; lample2018phrase; conneau2019cross introduced methods based on techniques aligning the embedding spaces of monolingual datasets and tricks such as denoising auto-encoding losses (10.1145/1390156.1390294) and back-translation (sennrich2015improving; edunov2018understanding).
Abstractive summarization (or sentence compression) is also studied in unsupervised settings. baziotis2019seq3 trained a model with a compressor-reconstructor strategy similar to back-translation while liu2019summae trained a denoising auto-encoder that embeds sentences and paragraphs in a common space.
Unsupervised attribute transfer is the task most related to our work. It mainly focuses on sentiment transfer with standard review datasets (maas-etal-2011-learning; he2016ups; shen2017style; li-etal-2018-delete), but also addresses sociolinguistic datasets containing text in various registers (gan2017semantic; rao-tetreault-2018-dear) or with different identity markers (voigt-etal-2018-rtgender; prabhumoye-etal-2018-style; lample2018multipleattribute). When paraphrase generation aims at being explicitly attribute-invariant, it is referred as obfuscation or neutralization (emmery-etal-2018-style; xu-etal-2019-privacy; pryzant2020automatically). Literary style transfer (xu-etal-2012-paraphrasing; pang2019unsupervised) has also been tackled by recent work. Here, we apply attribute transfer to a large dataset annotated in toxicity, but we also use the Yelp review dataset from shen2017style for comparison purposes (see Section 4).
Initial unsupervised attribute transfer approaches sought to build a shared and attribute-agnostic latent representation encoding for the input sentence, with adversarial training. Then, a decoder, aware of the destination attribute, generated a transferred sentence shen2017style; hu2017toward; fu2018style; zhang2018style; xu2018unpaired; john-etal-2019-disentangled.
Unsupervised attribute transfer approaches that do not rely on a latent space are also present in literature. li-etal-2018-delete
assumed that style markers are very local and proposed to delete the tokens most conveying the attribute, before retrieving a second sentence in the destination style. They eventually combined both sentences with a neural network.lample2018multipleattribute applied UNMT techniques from conneau2019cross to several attribute transfer tasks, including social media datasets. xu2018unpaired; gong2019reinforcement; luo2019dual; wu2019hierarchical
trained models with reinforcement learning.dai2019transformer introduced unsupervised training of a transformer called StyleTransformer (ST) with a discriminator network. Our approach differs from these unsupervised attribute transfer models in that they did not either leverage large pre-trained transformers, or train with a denoising objective.
The most similar work to ours is nogueira-dos-santos-etal-2018-fighting who trained for the first time an encoder-decoder rewriting offensive sentences in a non-offensive register with non-parallel data from Twitter (ritter-etal-2010-unsupervised) and Reddit (serban2017deep)
. Our approach differs in the following aspects. First, we use transformers pre-trained on a large corpus instead of randomly initialized RNNs for encoding and decoding. Second, their approach involves collaborative classifiers to penalize generation when the attribute is not transferred, while we train end-to-end with a denoising auto-encoder. Even if their model shows high accuracy scores, it suffers from low fluency, with offensive words being often replaced by a placeholder (e.g. “big” instead of “f*cking”).
As underlined by lample2018multipleattribute
, applying Generative Adversarial Networks (GANs)(zhu2017unpaired) to NLG is not straightforward because generating text implies a sampling operation that is not differentiable. Consequently, as long as text is represented by discrete tokens, loss gradients computed with a classifier cannot be back-propagated without tricks such as the REINFORCE algorithm (he2016dual) or the Gumbel-Softmax approximation baziotis2019seq3
which can be slow and unstable. Besides, controlled text generation(ficler-goldberg-2017-controlling; keskarCTRL2019; le2019flaubert; Dathathri2020Plug) is a NLG task that consists of a language model conditioned on the attributes of the generated text such as the style. But a major difference with attribute transfer is the absence of a constraint regarding the preservation of the input’s content.
3.1 Formalization of the attribute text rewriting problem
Let and be our two non-parallel corpora of comments satisfying the respective attributes “toxic” and “civil”. Let . We aim at learning a parametric function mapping a pair of source sentence and destination attribute to a fluent sentence satisfying and preserving the meaning of . In our case, there are two attributes “toxic” and “civil” that we assumed to be mutually exclusive. We denote to be the attribute of and the other attribute (for instance when “civil”, then “toxic”). Note that can simply be .
3.2 Our approach is based on bi-conditional encoder-decoder generation
Our approach is to train an autoregressive (AR) language model (LM) conditioned on both the input text and the destination attribute .
We compute with a LM . As we do not have access to ground-truth targets , we propose in section 3.3 a training function that we assume to maximize if and only if is a fluent sentence with attribute and preserving ’s content. Additionnaly, we use an AR generating model where inference of is sequential and the token generated at step depends on the tokens generated at previous steps: .
To condition on the input text, we follow the work of bahdanau2014neural; vaswani2017attention; nogueira-dos-santos-etal-2018-fighting; conneau2019cross; lample2018multipleattribute; dai-etal-2019-style; liu2019summae; raffel2019exploring and opt for an encoder-decoder framework. lample2018multipleattribute; dai-etal-2019-style argue that in unsupervised attribute rewriting tasks, encoders do not necessarily output disentangled representations, independent of its attribute. However, the t-SNE visualization of the latent space in liu2019summae allowed us to assume that encoders can output a latent representation , attending to content rather than on an attribute, with a similar training.
The LM is conditioned on the destination attribute with control codes introduced by keskarCTRL2019. A control code is a fixed sequence of tokens prepended to the decoder’s input , and supposed to prepare the generation in the space of sentences with the destination attribute . We define where is the control code of attribute .
3.3 Training the encoder-decoder with an unsupervised objective
Denoising objectives to train transformers are an effective self-supervised strategy. Devlin2018; yang2019xlnet pre-trained a uni-transformer encoder as a masked language model (MLM) to teach the system general-purpose representations, before fine-tuning on downstream tasks. conneau2019cross; lample2018multipleattribute; song2019mass; liu2019summae; raffel2019exploring explore various deshuffling and denoising objectives to pre-train or fine-tune bi-transformers.
During training, we corrupt the encoder’s input with the noise function from Devlin2018:
masks tokens randomly with probability 15%. Then, masks are replaced by a random token in the vocabulary with probability 10% or left as a sentinel (a shared mask token) with probability 90%. We train the model as an denoising auto-encoder (DAE), meaning that we minimize the negative log-likelihood
The hypothesis is that optimizing the DAE objective teaches the controlled generation to the model.
Inspired by an equivalent approach in unsupervised image-to-image style transfer (zhu2017unpaired), we add a cycle-consistency (CC) objective (nogueira-dos-santos-etal-2018-fighting; edunov2018understanding; prabhumoye-etal-2018-style; lample2018multipleattribute; conneau2019cross; dai-etal-2019-style):
which enforces content preservation in the generated prediction. As the cycle-consistency objective computes a non-differentiable AR pseudo-prediction
during stochastic gradient descent training, gradients are not back-propagated toat training step .
Finally, the loss function sums the DAE and the CC objectives with weighting coefficients:
3.4 The text-to-text bi-transformer architecture
The architectures for the encoder and decoder are uni-transformers. Contrary to vaswani2017attention; conneau2019cross; raffel2019exploring we do not keep decoder’s layers computing cross attention between the encoder’s outputs and the decoder hidden variables because generation suffers from too much conditioning on the input sentence and we observe no significant change in the output sentence. Rather, we follow liu2019summae and compute the latent representation with an affine transformation of the encoder’s hidden state (corresponding to the first token of the input text). Let be the input sequence of token. It is embedded then encoded by the uni-transformer encoder:
is an aggregate sequence representation for the input. There are different heuristics that can be used to integrate it in the decoder. We considered summingto the embedding of each token of the uni-transformer decoder’s input
since it balances the backpropagation of the signals coming from the original input and from the output being generated in the destination attribute space and it worked well in practice in our experiments.
Plus, the encoder and the decoder uni-transformers share the same embedding layer and the LM Head is tied to the embeddings.
Except for the dense layer computing the latent variable , all parameters are coming from the pre-trained bi-transformer published by raffel2019exploring. Thus, our DAE and CC objectives fine-tune T5’s parameters and this is why we call our model a conditional auto-encoder text-to-text transfer transformer (CAE-T5).
We employed the largest publicly available toxicity detection dataset to date, which was used in the ‘Jigsaw Unintended Bias in Toxicity Classification’ Kaggle challenge.555https://www.tensorflow.org/datasets/catalog/civil_comments The 2M comments of the Civil Comments dataset stem from a commenting plugin for independent news sites. They were created from 2015 to 2017 and appeared on approximately 50 English-language news sites across the world. Each of these comments was annotated by crowd raters (at least 3 each) for toxicity and toxicity subtypes DBLP:journals/corr/abs-1903-04561.
Following the work of dai-etal-2019-style
for the IMDB Movie Review dataset (positive/negative sentiment labels), we constructed a sentence-level version of the dataset. Initially, we fine-tuned a pre-trained BERT(Devlin2018) toxicity classifier on the Civil Comments dataset. Then, we split the comments in sentences with NLTK’s sentence tokenizer.666https://www.nltk.org/api/nltk.tokenize.html Eventually, we created (respectively ) with sentences whose system-generated toxicity score (using our BERT classifier) is greater than (respectively less than ) to increase the dataset’s polarity. The test ROC-AUC of the toxicity classifier is with a precision of and a recall of . Even with this low recall is large enough (approx. 90k, see Table 2).
We also conducted a comparison to other style transfer baselines on the Yelp Review Dataset (Yelp), commonly used to compare unsupervised attribute transfer systems. It consists of restaurant and business reviews annotated with a binary positive / negative label. shen2017style processed it and li-etal-2018-delete collected human reference human references for the test set777https://github.com/lijuncen/Sentiment-and-Style-Transfer/tree/master/data/yelp. Table 2 shows statistics for these datasets.
|Dataset||Yelp||Polar Civ. Com.|
Evaluating a text-to-text task is challenging, especially when no gold pairs are available. Attribute transfer is successful if generated text: 1) has the destination control attribute, 2) is fluent and 3) preserves the content of the input text.
4.2.1 Automatic evaluation
We follow the current approach of the community (NIPS2018_7959; logeswaran2018content; wang2019controllable; xu2019variational; lample2018multipleattribute; dai-etal-2019-style; he2020probabilistic) and approximate the three criteria with the following metrics:
[wide, labelwidth=!, labelindent=0pt]
Attribute control: Accuracy (ACC) computes the rate of successful changes in attributes. It measures how well the generation is conditioned by the destination attribute. We predict toxic and civil attributes with the same fine-tuned BERT classifier that pre-processed the Civil Comments dataset (single threshold at ).
Fluency: Fluency is measured by perplexity (PPL). To measure PPL, we employed GPT2 (radford2019language) LMs fine-tuned on the corresponding datasets (Civil Comments and Yelp).
Content preservation: Content preservation is the most difficult aspect to measure. UNMT (conneau2019cross), summarization (liu2019summae) and sentiment transfer (li-etal-2018-delete) have access to a few hundred samples with at least one human reference of the transferred text and evaluate content preservation by computing metrics based on matching words (e.g., BLEU papineni-etal-2002-bleu) between the generated prediction and the reference(s) (ref-metric). However, as we do not have these paired samples, we compute a content preservation score between the input and the generated sentences (self-metric).
Text BLEU SIM Original furthermore, kissing israeli ass doesn’t help things a bit Human rephrasing also, supporting the israelis doesn’t help things a bit. 57.6 70.6% Original just like the rest of the marxist idiots. Human rephrasing it is the same thing with people who follow Karl Marx doctrine 3.4 65.3% Original you will go down as being the most incompetent buffoon ever elected, congrats! Human rephrasing you could find out more about it. 2.3 16.2% Table 3: Evaluation with BLEU and SIM of examples rephrased by human crowdworkers.
Table 3 shows the BLEU scores (based on exact matches) of three examples rephrased by human annotators (Section 4.5). In the top-most example, BLEU score is high. This is explained by the fact that only 4 words are different between the two texts. In contrast to the first example, the two texts in the second example have only 1 word in common. Thus, the BLEU score is low. Despite the low evaluation, however, the candidate text could have been a valid rephrase of the reference text.
The high complexity of our task explains the motivation for a more general quantitative metric between input and generated text, capturing the semantic similarity rather than overlapping tokens. fu2018style; john-etal-2019-disentangled; gong2019reinforcement; pang2019unsupervised
proposed to represent sentences as a (weighted) average of their words embeddings before computing the cosine similarity between them. We adopted a similar strategy but we embedded sentences with the pre-trained universal sentence encoder(cer2018universal) and call it the sentence similarity score (SIM). The first two sentence pairs of Table 3 have high similarity scores. The rephrasings preserve the original content while not necessarily overlapping much with the original text. However, the last rephrasing does not preserve the initial content and have a low similarity score with its source sentence. As a statistical evidence, the self-SIM score comparing each of the 1,000 test Yelp reviews with their human rewriting is 80.2% whereas the self-SIM score comparing the Yelp review test set to a random derangement of the human references is 36.8%.
We optimised all three metrics because doing otherwise comes at the expense of the remaining metric(s). We aggregated the scores of the three metrics by computing the geometric mean888The geometric mean is not sensitive to the scale of the individual metrics. (GM) of ACC, 1/PPL and self-SIM.
4.2.2 Human evaluation
Following li-etal-2018-delete; zhang-etal-2018-learning-sentiment; zhang2018style; wu2019hierarchical; ijcai2019-732; wang2019controllable; john-etal-2019-disentangled; liu2019revision; luo2019dual; jin2019imat and to further confirm the performance of CAE-T5, we hired human annotators on Appen to rate in a blind fashion different models’ civil rephrasings of 100 randomly selected test toxic comments, in terms of attribute transfer (Att), fluency (Flu), content preservation (Con) and overall quality (Over) on a Likert scale from 1 to 5. Each rephrasing was annotated by 5 different crowd-workers whose annotation quality is controlled by test questions. If a rephrasing is rated 4 or 5 on Att, Flu and Con then it is “successful” (Suc).
We compare the output text that CAE-T5 generates with a selection of unpaired style-transfer models described in Section 2.2 (shen2017style; li-etal-2018-delete; fu2018style; luo2019dual; dai-etal-2019-style). We also compare with Input Masking. It is inspired by an interpretability method called Input Erasure (IE) Li2016. IE is used to interpret the decisions of neural models. Initially, words are removed one at a time and the altered texts are then re-classified (i.e., as many re-classifications as the words). Then, all the words that led to a decreased re-classification score (based on a threshold) are returned as the ones most related to the decision of the neural model. Our baseline follows a similar process, but instead of deleting, it uses a pseudo token (‘[mask]’) to mask one word at a time. When all the masked texts have been scored by the classifier, the rephrased text is returned, comprising as many masks as the tokens that led to a decreased re-classification score (set to 20% after preliminary experiments). We employed a pre-trained BERT as our toxicity classifier, fine-tuned on the Civil Comments dataset (see Section 4.1).
4.4.1 Quantitative comparison to prior work
Table 4 shows quantitative results on the Civil Comments dataset. Surprisingly, the perplexity (capturing fluency) of text generated by our model is lower than the perplexity computed on human comments. This can be explained by social media authors of comments expressing an important variability in language formal rules, that is only partially replicated by CAE-T5. Other approaches such as StyleTransformer (ST) and CrossAlignment (CA) have higher accuracy but at a cost of both higher perplexity and lower content preservation, meaning that they are better are discriminating toxic phrases but struggle to rephrase in a coherent manner.
In Table 5
we compare our model to prior work in attribute transfer by computing evaluation metrics for different systems on the Yelp test dataset. We achieve competitive results with low perplexity while getting good sentiment controlling (above human references). Our similarity though is lower, showing that some content is lost when decoding, hence the latent space does not fully capture the semantics. It is fairer to compare our model to other style transfer baselines on the Yelp dataset since our model is based on sub-word tokenization while the baselines are often based on a limited size pre-trained word embedding: many more words from the Civil Comments dataset could be attributed to the unknown token if we want to keep reasonable size vocabulary, resulting in a performance drop.
The human evaluation results shown in Table 6 correlate with the automatic evaluation results.
When considering the aggregated scores (geometric mean, success rate and overall human judgement), our model is ranked first on the Civil Comments dataset and second on the Yelp Review dataset, behind DualRL yet our approach is more stable and therefore easier to train when compared to reinforcement learning approaches.
|IE (BERT)||2.77||2.39||2.20||6 %||1.89|
4.4.2 Qualitative analysis
Table 7 shows examples of rephrases of toxic comments automatically generated by our system. The top first two examples emphasize the ability for the model to perform fluent control generation conditioned on both the input sentence and the destination attribute. We present more results showing that we can effectively suggest fluent civil rephrases of toxic comments in the Appendix Table 8. However we observe more failures than in the sentiment transfer task (see examples in the Appendix Table 9). We identify three natures of failure:
generation does not stop early enough and produces fluent, transferred, related but unnecessary content.
conditioning on the initial sentence fails and the model generates fluent but unrelated content.
- Position reversal
the author’s opinion is shifted.
In order to assess the frequency of hallucination and supererogation, we randomly selected 100 toxic comments from the test set and manually labeled the generated sentences with the non-mutually exclusive labels “contains supererogation” and “contains hallucination”. We counted on average 17% of generated sentences with surrerogation and 34% of generated sentences showing hallucination (often local). We observe that the longer the input comment, the more prone to hallucination is the generated text.
While supererogation and hallucination can be explained by the probabilistic nature of generation, we assume that position reversal is due to bias in the dataset, where toxic comments are correlated with negative comments. Thus, offensive comments tend to be transferred to supportive comments even though a human being would rephrase attacks as polite disagreements.
Interestingly, our model is able to add toxicity in civil comments as shown by the examples in the Appendix Table 10. Even if such an application shows limited interest for online platforms, it is worth warning about its potential misuse.
|stop being ignorant and lazy and try reading a bit about it.||try reading and be a little more informed about it before you try to make a comment.|
|this is absolutely the most idiotic post i have ever read on all levels.||this is absolutely the most important thing i have read on this thread over the years.|
|trump may be a moron, but clinton is a moron as well.||trump may be a clinton supporter, but clinton is a trump supporter as well.|
|shoot me in the head if you didn’t vote for trump.||
|50% of teachers don’t have any f*cks to give.||
50% of teachers don’t have
Supervised learning is a natural approach when addressing text-to-text tasks. In our study, we submit the civil rephrasing of toxic comments task to human crowd-sourcing. We randomly sampled 500 sentences from the toxic train set. For each sentence, we asked 5 annotators to rephrase it in a civil way, to assess if the comment was offensive and if it was possible to rewrite it in a way that is less rude while preserving the content. On 2500 answers, we tally 427 examples not flagged as impossible to rewrite and with a rephrasing different from the original sentence. This low yield is caused by two main issues. On the one hand, unfortunately not all toxic comments can be reworded in a civil manner so as to express a constructive point of view; severely toxic comments that are solely made of insults, identity attacks, or threats are not “rephrasable”. On the other hand, evaluating crowd-workers with test questions and answers is complex. The perplexity being higher on crowd-workers’ rephrases than on randomly sampled civil comments raises concerns about the production of human references via crowd-sourcing. The nature of large datasets labeled in toxicity and the lack of incentives for crowd-sourcing civil rephrasing annotation makes it expensive and difficult to train systems in a supervised framework. These limitations motivates unsupervised approaches.
Lastly, the more complex is the unsupervised attribute transfer task, the more difficult is its automatic evaluation. In our case, evaluating whether the attribute is actually transferred requires to train an accurate toxicity classifier. Furthermore, the language model we use to assess the fluency of the generated sentences has some limitations and does not generalize to all varieties of language encountered in social media. Finally measuring the amount of relevant content preserved between the source and generated texts remains a challenging, open research topic.
5 Conclusion and future work
This work is the second one to tackle civil rephrasing to our knowledge and the first one to address it with a fully end-to-end discriminator-free text-to-text self-supervised training. CAE-T5 leverages the NLU / NLG power offered by large pre-trained bi-transformers. The quantitative and qualitative analysis shows that ML systems could contribute to some extent to pacify online conversations, even though many generated examples still suffer from critical semantic drift.
In the future, we plan to explore whether the decoding can benefit from NAR generation (ma2019flowseq; ren2020study). We are also interested in the recent paradigm shift proposed by kumar2018vmf, where the generated tokens representation is continuous, allowing more flexibility in plugging attribute classifiers without sampling.
This work was completed in partial fulfillment for the PhD degree of the first author, which was supported by an unrestricted gift from Google. We are also grateful for support from the Google Cloud Platform credits program. We thank Thomas Bonald and Ion Androutsopoulos for their discussion, insight and useful comments.
Appendix A Supplemental Material
a.1 Experimental setup
a.1.1 Architecture details
We fine-tune the pre-trained “large” bi-transformer from raffel2019exploring. Both uni-transformers (encoder and decoder) have blocks each made of a 16-headed self-attention layer and a feed-forward network. The attention, dense and embedding layers have respective dimensions of , and , for a total of around 800 million parameters.
Input sentences are lowercased then tokenized with SentencePiece999gs://t5-data/vocabs/cc_all.32000/sentencepiece.model (kudo-richardson-2018-sentencepiece) and eventually truncated to a maximum sequence length of for the Yelp dataset and for the processed Civil Comments dataset. The control codes are for attributes in the sentiment transfer task and when we apply to the Civil Comments dataset.
a.1.2 Training details
During training, we apply dropout regularization at a rate of . We set . In preliminary experiments, we observed that was preserving little content from the initial sentence and that was weighting the preservation too much, at the cost of accuracy. Therefore we focused our experiments on . It is a good default setting since we don’t have a priori about the balance between fluency, accuracy (enforced with the auto-encoder) and content preservation (enforced with cycle consistency). DAE and back-transfer (in the course of the CC computation) are trained with teacher-forcing; we do not need AR generation since we have access to a target for the decoder’s output. Each training step computes the loss on a mini-batch made of 64 sentences sharing the same attribute. Mini-batches of attributes and
are interleaved. Since the Civil Comments dataset is class imbalanced, we sample comments from the civil class of the training set at each epoch. The optimizer is AdaFactor(shazeer2018adafactor) and we train for 88900 steps for 19 hours on a TPU v2 chip.
a.1.3 Evaluation details
Decoding is greedy. The parametric models used to compute ACC and PPL are 12-layer, 12 headed pre-trained, and fine-tuned uni-transformers with hidden size. The BERT classifier is an encoder followed by a sequence classification head and the GPT2 LM is a decoder with a LM head on top. We use the sacrebleu101010https://github.com/mjpost/sacrebleu/blob/master/sacrebleu/sacrebleu.py implementation for BLEU and the universal sentence encoder pre-trained by Google to compute SIM111111https://tfhub.dev/google/universal-sentence-encoder/2.
a.2 CAE-T5 learning algorithm
Figure 2 illustrates flows through the encoder-decoder model at inference.
|Toxic to Civil|
|Input||and then they need to do what it takes to get rid of this mentally ill bigot!|
|CAE-T5||and then they need to do what they can to get rid of this kind of man that is causing the whole problem!|
|Input||do you think any of those idiots vote?|
|CAE-T5||do you think any of those people vote?|
|Input||this is just so stupid.|
|CAE-T5||this is just so sad.|
|Input||are you nuts?|
|CAE-T5||are you kidding me?|
|Input||we are all entitled to our own opinions but not the facts that one can’t find in this criminally stupid opinion.|
|CAE-T5||we are all entitled to our own opinion but not to the kind of facts that can be found in this case.|
|Input||hopefully that will prevent him from making any further idiotic statements.|
|CAE-T5||hopefully that will prevent him from making any more such statements.|
|Input||the poll is garbage.|
|CAE-T5||the poll is fake news.|
|Input||it was irresponsible to publish this garbage.|
|CAE-T5||it was irresponsible to publish this information.|
|Input||just because someone else has a different opinion does not mean that person is stupid, uneducated or foolish.|
|CAE-T5||just because someone has a different opinion does not mean that someone is less intelligent, less informed, less able to understand, and less capable of making informed decisions………|
|Input||biased leftist trash article.|
|CAE-T5||biased leftist article.|
|Input||dumb people vote for trump.|
|CAE-T5||most people vote for trump.|
|Input||what hypocrites your sources are!|
|CAE-T5||what are your sources of information? ! ! ! !|
|Input||bill o’reilly is a narcissistic sociopath.|
|CAE-T5||bill o’reilly is a neo-liberal………….|
|Input||f*ck the flag.|
|Input||i guess we can kill them.|
but i think we can
|Input||try doing a little research before you make a fool of yourself with such blatantly false drivel.|
try doing a little research before you make a blatant mistake in your
|Input||hey old fart, you losers have always been fake news.|
hey old soul, you have never been right, the fake news has
|Input||i bet you think you’re being rational and not just quoting another completely idiotic piece of denialist propaganda.|
i bet you think you’re being rational and not just a repeating the same unproven, ill-informed propaganda that you’re referring to.
|Input||the ontario voters were rated the dumbest people in all the whole world.|
|Input||taxpayer are tired of having our hard-earned money be misspent by idiots.|
taxpayers are tired of having our hard-earned money
|Input||suggesting that people change their commute times is stupid.|
suggesting that people change their
|Input||this hateful group of cowards are successfully destroying the united states.|
the hateful group of
|Input||whatever the idiot in chief touches turns to crap.|
whatever the president does
|Input||either you are too ignorant or stupid to read the first dictionary definition.|
either you are too
|Input||trump is doing right things, he is great president, a lot better than 44th one from kenya|
|lot of good people support trump, only crooked and dumb people against|
trump is doing great,
|Input||the dumb become dumber.|
|Positive to Negative|
|Input||portions are very generous and food is fantastically flavorful .|
|DualRL||portions are very thin and food is confusing .|
|ST (Multi)||portions are very poorly and food is springs flavorless .|
|CAE-T5||portions are very small and food is awfully greasy for the price .|
|Human||portions are very small and food is not flavorful .|
|Input||staff : very cute and friendly .|
|DualRL||staff : very awful and rude .|
|ST (Multi)||staff : very nightmare and poor .|
|CAE-T5||staff : very rude and pushy .|
|Human||staff : very ugly and mean .|
|Input||friendly and welcoming with a fun atmosphere and terrific food .|
|DualRL||rude and unprofessional with a loud atmosphere and awful food .|
|ST (Multi)||poor and fake with a fun atmosphere and mushy food .|
|CAE-T5||rude and unhelpful service with a forced smile and attitude .|
|Human||unfriendly and unwelcoming with a bad atmosphere and food .|
|Input||i love their star design collection .|
|DualRL||i hate their star design disgrace .|
i do n’t care star
|Human||i ca n’t stand their star design collection .|
|Input||oj and jeremy did a great job !|
oj and jeremy did a great job !
|ST (Multi)||oj and jeremy did a terrible job !|
|Human||oj and jeremy did a terrible job !|
|Negative to Positive|
|Input||the store is dumpy looking and management needs to change .|
|DualRL||the store is perfect looking and management speaks to change perfectly .|
|ST (Multi)||the store is dumpy looking and management moved to change .|
|Ours||the store is neatly organized and clean and staff is on top of it .|
|Human||managment is top notch , the place looks great .|
|Input||i emailed to let them know but they apparently dont care .|
|DualRL||i loved them know them know but they dont care .|
|ST (Multi)||i emailed to let them know but they honestly played their .|
|CAE-T5||i emailed to let them know and they happily responded right away . a great service|
|Human||i emailed to let them know they really do care .|
|Input||this place is dirty and run down and the service stinks !|
|DualRL||this place is clean and run perfect and the service helped !|
|ST (Multi)||this place is quick and run down and the service stunning !|
|CAE-T5||this place is clean and well maintained and the service is great ! ! !|
|Human||this place is clean , not run down , and the service was great .|
|Input||do not go here if you are interested in eating good food .|
|DualRL||definitely go here if you are interested in eating good food .|
|ST (Multi)||do not go here if you are interested in eating good food .|
definitely recommend this place if you are looking for good food
|Human||do not go here if you are interested in eating bad food .|
|Input||my husband had to walk up to the bar to place our wine order .|
|DualRL||my husband had to walk up to the bar to place our drink order perfectly .|
|ST (Multi)||my husband had to walk up to the bar to place our wine order .|
my husband loves to stand in line
|Human||my husband did n’t have to walk up to the bar to place our wine order .|
|Civil to Toxic|
|Input||great effort and great season.|
|CAE-T5||stupid and horrible move by this team.|
|are the foundation of this great country.|
|CAE-T5||canadians are the dumbest people in this country.|
|Input||why isn’t trudeau in jail for gross negligence?|
|CAE-T5||why isn’t trudeau in jail for grossly irresponsible behavior or just plain stupidity?????|
|Input||there really is no safe place on the planet to live anymore.|
|CAE-T5||there really is no safe place on the planet for the stupid to live.|
|Input||trudeau will do as he’s told.|
|CAE-T5||trudeau will do as he’s told, he’s stupid.|
|Input||all of canada is paying for that decision.|
|CAE-T5||all of canada is paying for that stupid decision.|
|Input||people like you need their talking points.|
|CAE-T5||people like you need their asses kicked.|
|Input||this is a great article that hits the nail on the head.|
this is a
|Input||and just which money tree is going to pay for this?|
and just where is the money going to come from to pay for this crap?
|Input||you need to get a new dictionary.|
you need to get a new f*cking dictionary.
|Input||the real question is; will the hell’s angels forgo their major source of income?|
the real question is when will the idiocy of
|Input||according to http://cookpolitical.com/story/10174, trump got 46.1% of the popular vote.|
|Input||so it’s okay to sacrifice the environment in the name of the almighty dollar……|
so it’s okay to destroy the world with the actions of one stupid dude
|Input||the president dismissed the ecological findings of over 87% of scientists who have been studying the effects of global warming, largely caused by the release of carbon from fossil fuel into the atmosphere.|
the president ignored the scientific consensus
|Input||not sure where you got your definition of a good guy.|
not sure where you got your