Revision in Continuous Space: Fine-Grained Control of Text Style Transfer

05/29/2019 ∙ by Dayiheng Liu, et al. ∙ Sichuan University 0

Typical methods for unsupervised text style transfer often rely on two key ingredients: 1) seeking for the disentanglement of the content and the attributes, and 2) troublesome adversarial learning. In this paper, we show that neither of these components is indispensable. We propose a new framework without them and instead consists of three key components: a variational auto-encoder (VAE), some attribute predictors (one for each attribute), and a content predictor. The VAE and the two types of predictors enable us to perform gradient-based optimization in the continuous space, which is mapped from sentences in a discrete space, to find the representation of a target sentence with the desired attributes and preserved content. Moreover, the proposed method can, for the first time, simultaneously manipulate multiple fine-grained attributes, such as sentence length and the presence of specific words, in synergy when performing text style transfer tasks. Extensive experimental studies on three popular text style transfer tasks show that the proposed method significantly outperforms five state-of-the-art methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

Code Repositories

Fine-Grained-Style-Transfer

Revision in Continuous Space: Fine-Grained Control of Text Style Transfer


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Text style transfer, which is an under-explored challenging task in the field of text generation, aims to convert some attributes of a sentence (e.g., negative sentiment) to other attributes (e.g., positive sentiment) while preserving attribute-independent content. In other words, text style transfer can generate sentences with desired attributes in a controlled manner. Due to the difficulty in obtaining training sentence pairs with the same content and differing styles, this task usually works in an unsupervised manner where the model can only access non-parallel, but style labeled sentences.

Most existing methods (Hu et al., 2017; Shen et al., 2017; Fu et al., 2018; Li et al., 2018; Prabhumoye et al., 2018; Yang et al., 2018; John et al., 2019) for text style transfer usually first explicitly disentangle the content and the attribute through an adversarial learning paradigm (Goodfellow et al., 2014)

. The attribute-independent content and the desired attribute vector are then fed into the decoder to generate the target sentence. However, some recent evidence suggests that using adversarial learning may not be able to learn representations that are disentangled

(Li et al., 2018; Guillaume Lample, 2019). Moreover, vanilla adversarial learning is designed for generating real-valued and continuous data but has difficulties in directly generating sequences of discrete tokens. As a result, algorithms such as REINFORCE (Sutton et al., 2000; Yu et al., 2017; Li et al., 2017; Che et al., 2017; Lin et al., 2017; Guo et al., 2018)

or those that approximate the discrete tokens with temperature-softmax probability vectors

(Kusner and Hernández-Lobato, 2016; Zhang et al., 2017; Hu et al., 2017; Prabhumoye et al., 2018; Yang et al., 2018) are used. Unfortunately, these methods tend to be unstable, slow, and hard-to-tune in practice (Guillaume Lample, 2019).

Is it really a necessity to explicitly disentangle the content and the attributes? Also, do we have to use adversarial learning to achieve text style transfer? Recently, the idea of mapping the discrete input into a continuous space and then performing gradient-based optimization with a predictor to find the representation of a new discrete output with desired property has been applied for sentence revision (Mueller et al., 2017) and neural architecture search (Luo et al., 2018). Motivated by the success of these works, we propose a new solution to the task of content-preserving text style transfer. This method can be easily trained on the non-parallel dataset without adversarial training which is used in most existing methods. Furthermore, unlike most previous methods that only control a single binary attribute (e.g., positive and negative sentiments), our approach can further control multiple fine-grained attributes such as sentence length and the existence of specific words (Liu et al., 2018).

The proposed approach contains three key components: (a) A variational auto-encoder (VAE) (Kingma and Welling, 2013; Rezende et al., 2014; Fabius and van Amersfoort, 2015; Bowman et al., 2016), whose encoder maps sentences into a smooth continuous space and its decoder can map a continuous representation back to the sentence. (b) Some attribute predictors that take the continuous representation of a sentence as input and predict the attributes of its decoder output sentence, respectively. These attribute predictors enable us to find the target sentence with the desired attributes in the continuous space. (c) A content predictor that takes the continuous representation of a sentence as input and predicts the Bag-of-Word (BoW) feature of its decoder output sentence. The purpose of component (c) is threefold: First, it could enhance the content preservation during style transfer; Second, it enables the target sentence to contain some specific words; Third, it can tackle the vanishing latent variable problem of VAE (Zhao et al., 2017). With the gradients obtained from these predictors, we can revise the continuous representation of the original sentence by gradient-based optimization to find a target sentence with the desired fine-grained attributes, and achieve the content-preserving text style transfer.

The contributions of this paper could be summarized as below:

  • We propose a new method for fine-grained control of text style transfer task, which does not explicitly disentangle the content and the attribute and avoids the training difficulties caused by the use of adversarial learning in the previous methods.

  • Unlike most previous methods that only control a single binary attribute, the proposed method can simultaneously control multiple fine-grained attributes such as sentence length, and containing specific words. To the best of our knowledge, it is the first text style transfer method that can control such fine-grained attributes.

  • Extensive experimental comparisons on three popular text style transfer tasks show that the proposed method significantly outperforms five state-of-the-art methods.

2 Related Works

We have witnessed an increasing interest in text style transfer under the setting of non-parallel data. Most such methods explicitly disentangle the content and the attribute. One line of research leverages the auto-encoder framework to encode the original sentence into an attribute-independent content representation with adversarial learning, which is then fed into the decoder with a style vector to output the transferred sentence.

In Hu et al. (2017); Shen et al. (2017); Prabhumoye et al. (2018), adversarial learning is utilized to ensure that the output sentence has the desired style. In order to disentangle the content and the attribute, Hu et al. (2017) enforces the output sentence to reconstruct the content representation, while Fu et al. (2018); Zhao et al. (2017); John et al. (2019) apply adversarial learning to discourage encoding style information into the content representation. Shen et al. (2017) utilizes adversarial learning to align the generated sentences from one style to the data domain of the other style. In (Yang et al., 2018), the authors extend the cross-align method (Shen et al., 2017) by employing a language model as the discriminator, which can provide a more stable and more informative training signal for adversarial learning.

However, as argued in (Li et al., 2018; Guillaume Lample, 2019), it is often easy to fool the discriminator without actually learning the representations that are disentangled. Unlike the methods mentioned above that disentangle the content and the attribute with adversarial learning, another line of research (Prabhumoye et al., 2018; Logeswaran et al., 2018; Guillaume Lample, 2019) applies back-translation (Wintner et al., 2016) to rephrase a sentence while reducing the stylistic properties and encourage content compatibility. Besides, the authors in (Li et al., 2018) directly mask out the words associated with the original style of the sentence to obtain the attribute-independent content text.

Instead of revising the sentence in the discrete space with prior knowledge as in (Li et al., 2018), our method maps the discrete sentence into a continuous representation space and revises the continuous representation with the gradient provided by the predictors. This method does not explicitly disentangle the content and the attribute and avoids the training difficulties caused by the use of adversarial learning in the previous methods. Similar ideas have been proposed in (Mueller et al., 2017; Luo et al., 2018) for sentence revision and neural architecture search. As pointed out in (Shen et al., 2017), the model proposed in (Mueller et al., 2017) does not necessarily enforce content preservation, while our method employs a content predictor to enhance content preservation. Furthermore, unlike most previous methods that only control a single binary attribute (e.g., positive and negative sentiments), our approach can further control multiple fine-grained attributes such as sentence length and the existence of specific words. To our best knowledge, these fine-grained attributes have not been studied before in text style transfer task.

Figure 1: There is an example of content-preserving text sentiment transfer, and we hope to further increase the length of the target sentence compared with the original sentence. The original sentence with negative sentiment is mapped to continuous representation via encoder. Then is revised into by minimizing the error with the sentiment predictor , length predictor , and the content predictor . Afterwards the target sentence is generated by decoding with beam search via decoder [best viewed in color].

3 Methodology

Let denote a dataset which contains sentences paired with a set of attributes . Each has attributes of interest . Unlike most previous methods (Shen et al., 2017; Fu et al., 2018; Prabhumoye et al., 2018; Li et al., 2018; Yang et al., 2018) that only consider a single binary attribute (e.g., positive or negative sentiments), here we consider multiple fine-grained attributes such as sentence length and the presence of specific words (e.g., a pre-defined subject noun). For example, given a original sentence =“the salads are fresh and delicious.”, its attribute set can be ={sentiment=positive, length=7, subject_noun=salads}. Our task is to learn a generative model that can generate a new sentence with the required attributes , and retain the attribute-independent content of as much as possible.

3.1 Model Structure

The proposed model consists of three components: a variational auto-encoder (VAE), attribute predictors, and a content predictor.

Variational auto-encoder . The VAE integrates stochastic latent representation into the auto-encoder architecture. Its RNN encoder maps a sentence into a continuous latent representation :

(1)

and its RNN decoder maps the representation back to reconstruct the sentence :

(2)

where and denote the parameters of the encoder and decoder. The VAE is then optimized to minimize the reconstruction error of input sentences, and meanwhile minimize the KL term to encourages the to match the prior :

(3)

where is the KL-divergence. Compared with traditional deterministic auto-encoder, the VAE offers two main advantages in our approach:

(1) Deterministic auto-encoders often have “holes” in their latent space, where the latent representations may not able to generate anything realistic (Roberts et al., 2018)

. In contrast, by imposing a prior standardized normal distribution

on the latent representations, the VAE learns latent representations not as single isolated points, but as soft dense regions in continuous latent space which makes it be able to generate plausible examples from every point in the latent space (Bowman et al., 2016). This characteristic avoids the problem that the representation revised (optimized) by the gradient not being able to generate a plausible sentence.

(2) This continuous and smooth latent space learned by the VAE enables the sentences generated by adjacent latent representation to be similar in content and semantics (Bowman et al., 2016; Semeniuta et al., 2017; Goyal et al., 2017; Yang et al., 2017; Shen et al., 2018). Therefore, if we revise the representation within a reasonable range (i.e., small enough), the resulting new sentence would not differ much in content from the original sentence.

Attribute predictors . Each of them takes the representation as input and predict one attribute of the decoder output sentence generated by

. For example, the attribute predictor can be a binary classifier for positive-negative sentiment prediction or a regression model for sentence length prediction. With the gradients provided by the predictors, we can revise the continuous representation

of the original sentence by gradient-based optimization to find a target sentence with the desired attributes .

The attribute predictors are trained in two stages. We firstly jointly train these attribute predictors with VAE. For M-classification predictors, we have

(4)

where . And for the regression predictors, we have

(5)

where . In this joint training, we take the attributes of the input sentence as the label of predictors. Since the predictor are designed to predict the attribute of the sentence generated by , we further train each predictor individually after joint training. We sample from and feed it into the decoder to generate a new sentence . Afterwards we feed into the CNN text classifiers (Kim, 2014) which are trained on the training set to predict its attributes222Some attributes can be obtained directly without using classifiers, such as the length of . as the label of the predictors:

(6)

Content predictor . It is a multi-label classifier that takes as input and predicts the Bag-of-Word feature of its decoder output sentence:

(7)

We assume as -trial multimodal distribution:

(8)

where is the size of vocabulary, is the length of , and is the output value of -th word in .

The training of content predictor has also two stages. Firstly it is jointly trained with VAE:

(9)

After joint training, it is trained separately through:

(10)

During text style transfer, we can similarly revise the representation with the gradient provided by the content predictor to enhance content preservation. Here we consider two ways to enhance content preservation during style transfer. We can set to contain all the words in the original sentence , which means that we try to find a sentence with the desired attributes and keep all the words of the original sentence as much as possible to achieve content preservation. However, retaining all the words is often not what we want. For example, should not contain the original emotional words in the task of text sentiment transfer. Instead, the noun in the original sentence should be retained in such a task (Melnyk et al., 2017; Li et al., 2018; John et al., 2019). Therefore, we can set to contain only all nouns in . Furthermore, we can set to contain some desired specific words to achieve finer-grained control of target sentences.

Putting them together, the final joint training loss is as follows:

(11)

where and are balancing hyper-parameters. It should be noted that and also act as regularizers that prevent the encoder from being trapped into a KL vanishing state (Bowman et al., 2016; Kingma et al., 2016; Yang et al., 2017; Shen et al., 2018; Alemi et al., 2018; Liu et al., 2019).

3.2 Text Style Transfer

Given the original sentence , the inference process of style transfer is performed in the continuous space. We revise its representation by gradient-based optimization as follows:

(12)

where is the step size and is the trade-off parameter to balance the content preservation and style transfer strength. We iterate such optimization to find the until the confidence of attribute predictors is greater than threshold or reach the maximum number of rounds . The target is obtained by decoding with a beam search (Och and Ney, 2004). An example procedure is shown in Figure 1.

4 Experiments

The experiments are designed for answering the following questions: Q1: Compared with the state-of-the-art methods, how well do our methods perform in the text style transfer tasks? To answer this question, we evaluate them on three publicly available datasets of sentiment transfer and gender style transfer tasks. Q2: Can our methods further control fine-grained attributes such as length and control multiple attributes at the same time? To verify this, we conduct several experiments on text sentiment transfer tasks and simultaneously control other fine-grained attributes such as length and keyword presence.

4.1 Text Sentiment Transfer

Data

We use two datasets, Yelp restaurant reviews and Amazon product reviews (He and McAuley, 2016)333These datasets can be download at http://bit.ly/2LHMUsl., which are commonly used in prior works too (Shen et al., 2017; Fu et al., 2018; Li et al., 2018; Prabhumoye et al., 2018). Following their experimental settings, we use the same pre-processing steps and similar experimental configurations.

Methods Accuracy PPL Overlap Noun% BLEU Suc%
Original 0.1 22.9 100.0 100.0 42.4 0.1
Human 91.8 76.9 47.2 78.5 100.0 83.3
CrossAligned (Shen et al., 2017) 73.6 72.0 41.1 42.9 18.4 27.9
StyleEmbedding (Fu et al., 2018) 7.2 93.9 75.4 74.2 31.9 2.1
MultiDecoder (Fu et al., 2018) 48.8 166.5 51.5 52.2 23.1 11.3
BTS (Prabhumoye et al., 2018) 94.8 32.8 21.5 23.5 6.8 31.9
Delete, Retrieve, & Generate (Li et al., 2018):
TemplateBased 81.3 183.6 55.6 83.3 28.9 42.5
DeleteOnly 85.8 81.4 49.5 74.9 24.7 51.4
RetrievalOnly 98.4 25.7 15.8 39.6 4.7 51.0
DeleteAndRetrieve 89.5 96.1 49.4 74.0 24.9 55.7
Ours-1 88.2 26.5 46.6 77.4 21.8 66.9
Ours-2 92.3 18.3 38.9 69.3 18.8 67.9
Ours-3 95.7 20.6 39.7 61.5 17.9 66.3
Methods Accuracy PPL Overlap Noun% BLEU Suc%
Original 23.4 24.4 100.0 100.0 57.2 23.2
Human 88.1 62.9 60.5 85.0 100.0 81.2
CrossAligned (Shen et al., 2017) 69.6 18.3 19.3 20.4 5.0 28.8
StyleEmbedding (Fu et al., 2018) 40.5 87.7 42.2 41.8 22.1 13.2
MultiDecoder (Fu et al., 2018) 66.5 80.8 30.6 30.4 14.3 19.8
BTS (Prabhumoye et al., 2018) 82.6 25.3 24.7 22.5 9.2 36.9
Delete, Retrieve, & Generate (Li et al., 2018):
TemplateBased 69.6 108.9 73.3 87.9 42.8 50.0
DeleteOnly 51.6 49.3 74.4 95.1 44.7 44.1
RetrievalOnly 87.2 28.7 21.0 44.5 6.7 51.2
DeleteAndRetrieve 55.2 48.2 69.1 92.6 41.8 48.7
Ours-1 81.9 35.0 37.7 76.0 11.5 59.1
Ours-2 85.1 21.8 49.3 49.8 21.5 55.9
Ours-3 90.0 15.9 39.5 41.4 16.3 54.5
Table 1: Evaluation results of the sentiment transfer tasks on Yelp (Top) and Amazon (Bottom). The notation means the higher the better, while means the lower the better. For our models, we report different results (denoted as ours-1, ours-2, and ours-3) corresponding to different choices of hyper-parameters ( and ), which demonstrates our models’ ability to control the trade-off between attribute transfer and content preservation. For each evaluation criterion, we bold the best values (except for Human and Original). The accuracies of the classifier on the test set of Yelp and Amazon are 98.2% and 84.0%.

Comprehensive Quantitative Evaluation

There are three criteria for a good style transfer (Li et al., 2018; Prabhumoye et al., 2018). Concretely, the generated sentences should: 1) have the desired attributes; 2) be fluent; 3) preserve the attribute-independent content of the original sentence as much as possible. For the first and second criteria, we follow previous works (Shen et al., 2017; Fu et al., 2018; Li et al., 2018; Prabhumoye et al., 2018) in using model-based evaluation. We measure whether the style is successfully transferred according to the prediction of a pre-trained bidirectional LSTM classifier (Schuster and Paliwal, 1997; Hochreiter and Schmidhuber, 1997), and measure the language quality by the perplexity (PPL) of the generated sentences with a pre-trained language model. Following previous works, we use the trigram Kneser-Ney smoothed language model (Kneser and Ney, 1995) trained on the respective dataset. Since it is hard to measure the content preservation, we follow previous works and report two metrics: 1) Word overlap, which counts the unigram word overlap rate of the original sentence and the generated sentence , computed by ; 2) Because most nouns in sentences are attribute-independent content (Melnyk et al., 2017; Li et al., 2018) in this task, we also calculate the percentage of nouns (e.g., as detected by a POS tagger) in the original sentence appearing in the generated sentence (denoted as Noun%). Because a good model should perform well on all three criteria, it is reasonable to propose a more comprehensive metric that serves as a lower bound of transfer success percentage (denoted as Suc%): One such sample is considered as transfer successful if its attribute is consistent with the classifier prediction of the desired attribute, its language probability is no less than a threshold, and it contains at least one noun of the original sentence. There are 1000 human annotated sentences as the ground truth of the transferred sentences in (Li et al., 2018). We also take them as references and report the bi-gram BLEU scores (Papineni et al., 2002).

We compare our method with several previous state-of-the-art methods (Shen et al., 2017; Fu et al., 2018; Li et al., 2018; Prabhumoye et al., 2018). We report the results of the human-written sentences as a strong baseline. The results of not making any changes to the original sentences (denoted as Original) are also reported. The effect of using different hyper-parameters and the ablation study are analyzed in Appendix A.

Table 1 shows the evaluation results on two datasets. Generally we find that StyleEmbedding and MultiDecoder achieve high content retention (Overlap, BLEU, and Noun%), but their fluency (PPL) and transfer accuracy are poor, resulting in low overall scores (Suc%). On the contrary, BST achieves high fluency and transfer accuracy, while the content is poorly preserved. The fluency of CrossAligned is better, but it does not perform in both content preservation and sentiment transfer. Because the methods proposed in (Li et al., 2018) are based on prior knowledge to revise the original sentence in the discrete space, they (except for RetrievalOnly) can achieve both high content retention and transfer accuracy. However, the generated sentences are not fluent enough. Our methods revise the original sentence in a continuous space, which does well in fluency, content preservation, and transfer accuracy. They achieve the highest overall scores over all baselines. In addition, we can see that our methods can control the trade-off between the transfer accuracy and content preservation.

Human Evaluation

We conduct human evaluations to verify the performance of our methods on two datasets further. Following previous works (Li et al., 2018; Fu et al., 2018), we randomly select 50 original sentences and ask 7 evaluators444All evaluators have Bachelor or higher degree. They are independent of the authors’ research group. to evaluate the sentences generated by different methods. Each generated sentence is rated on the scale of 1 to 5 in terms of transfer accuracy, preservation of content, and language fluency. The results are shown in Table 2. It can be seen that our models significantly outperform all the baselines on the percentage success rate (Suc%) for two datasets. The generated examples can be found in Appendix B.

Yelp Amazon
Acc Gra Con Suc% Acc Gra Con Suc%
Human 4.1 4.4 3.6 78 3.5 4.3 3.9 60
CrossAligned (Shen et al., 2017) 3.3 2.9 2.6 22 3.0 3.3 1.6 6
MultiDecoder (Fu et al., 2018) 2.4 3.0 3.1 12 2.3 2.7 2.5 6
BTS (Prabhumoye et al., 2018) 3.9 3.7 1.8 26 2.8 3.3 1.8 8
DeleteAndRetrieve (Li et al., 2018) 3.8 3.6 3.5 54 2.4 3.5 3.8 28
Ours-1 3.6 4.1 3.1 66 3.4 4.0 2.8 42
Ours-2 3.7 4.3 3.2 72 3.7 4.0 2.4 40
Ours-3 3.8 4.1 3.0 60 3.8 4.5 2.5 50
Table 2: Human evaluation results of the sentiment transfer tasks on Yelp and Amazon. We show average human ratings for transfer accuracy (Acc), preservation of content (Con), and fluency of sentences (Gra) on 1 to 5 score. “Suc%" denotes the overall percentage success rate. We consider a generated output “successful" if it is rated no less than 3 on all three criteria (Att, Con, Gra).
Methods Accuracy PPL Overlap Noun% Suc%
Orginal 21.9 183.4 100.0 100.0 21.9
BTS (Prabhumoye et al., 2018) 60.3 145.0 37.9 35.3 36.3
Ours-1 79.9 78.9 46.4 53.8 63.9
Ours-2 71.3 87.8 51.8 57.5 58.7
Ours-3 70.6 98.2 46.8 69.6 66.6
Table 3: Evaluation results of the gender transfer task on Yelp. For our models, we report different results corresponding to different choices of hyper-parameters ( and ) to demonstrate our models’ ability to control the trade-off between attribute transfer and content preservation. The accuracy of the classifier on the test set is 83.1%.

4.2 Text Gender Style Transfer

We use the same dataset555This dataset can be download at http://tts.speech.cs.cmu.edu/style_models/gender_classifier.tar. as in (Prabhumoye et al., 2018), which contains reviews from Yelp annotated with two sexes (they only consider male or female due to the absence of corpora with other gender annotations (Eckert and McConnell-Ginet, 2013)). Following (Prabhumoye et al., 2018), we use the same pre-processing steps and similar experimental configurations. We directly compare our method against BST (Prabhumoye et al., 2018) which has been shown to outperform the previous approach (Shen et al., 2017) on this task. We use the same metrics described in Section 4.1 except for the BLEU score because this dataset does not provide the human annotated sentences. The results are shown in Table 3. We can see our methods outperform BST (Prabhumoye et al., 2018) on all metrics. The generated examples are shown in Appendix C.

Methods Accuracy PPL Overlap Noun% Len% Key%
Original 0.1 22.9 100.0 100.0 100.0 7.8
Keywords 16.7 43.9 39.2 56.0 98.1 92.3
Sentiment + Keywords 91.6 52.6 24.5 42.4 106.0 78.3
Length 0.2 29.8 25.0 48.3 208.8 5.9
Sentiment + Length 97.7 25.4 21.4 51.7 189.5 9.2
Keywords + Length 25.6 44.5 29.8 61.8 165.0 83.2
Sentiment + Keywords + Length 93.0 51.8 18.8 50.0 183.7 66.6
Length 0.2 31.3 30.7 25.2 40.8 6.3
Sentiment + Length 95.1 23.0 29.1 38.1 66.9 6.7
Keywords + Length 21.4 87.0 28.4 38.9 61.6 83.7
Sentiment + Keywords + Length 87.6 123.8 16.3 23.7 60.9 63.0
Table 4: Results of fine-grained Attributes control on the Yelp. Different rows correspond to the set of attributes being controlled by the model.

4.3 Multiple Fine-Grained Attributes Control

We conduct experiments on controlling fine-grained attributes (length or keyword presence) and simultaneously manipulating multiple attributes (length, keyword presence, and sentiment) of the original sentence. We use the same dataset, Yelp, and the same metrics used in Section 4.1. For the attribute of length, we design two experiments: 1) We hope that the target sentence can add some relevant content to the original sentence, and increase its length by twice (denoted as Length); 2) We hope that the target sentence can compress the content of the original sentence and reduce its length by half (denoted as Length). For evaluation, we measure the percentage of the length of the generated sentences to the length of the original sentences (denoted as Len%). For the attribute of keyword presence, we hope that the target sentence can contain a pre-defined keyword and retain the content of the original sentence as much as possible (denoted as Keywords). In our experiments, we define a keyword as a noun that is semantically most relevant (computed by the cosine distance of pre-trained word embeddings) to the original sentence but do not appear in the original sentence. The percentage of the generated sentences contain the pre-defined keyword (denoted as Key%) is reported.

The results are shown in Table 4. For a single fine-grained attribute, it can be observed that Keywords achieves 92.3 Key% score, Length and Length achieve 208.8 and 40.8 Len% scores respectively. At the same time, the fluency and content retention scores are still high. These results demonstrate the proposed method can control such fine-grained attributes. When we further control the sentiment attribute, we can see that Sentiment + Keywords achieves 91.6% accuracy, while the accuracy of Sentiment + Length and Sentiment + Length is 97.7% and 95.1% respectively. Meanwhile, their rest scores have not declined significantly. When simultaneously controlling all these attributes, Sentiment + Keywords + Length achieves 93.0% accuracy, 183.7 Len% score, and 66.6 Key% score, while Sentiment + Keywords + Length achieves 87.6% accuracy, 60.9 Len% score, and 63.0 Key% score. Since it is more difficult to reduce sentence length than to increase sentence length while controlling other attributes, the fluency of Sentiment + Keywords + Length is worse than Sentiment + Keywords + Length. We show some generated examples in Appendix D. These results indicate that our proposed method can control multiple attributes simultaneously.

5 Conclusion

In this paper, we explore a novel task setting for text style transfer, in which it is required to simultaneously manipulate multiple fine-grained attributes. We propose to address it by revising the original sentences in a continuous space based on gradient-based optimization. Experimental results demonstrate that the proposed method can simultaneously manipulate multiple fine-grained attributes such as sentence length and the presence of specific words. To our best knowledge, this is the first time that a style transfer algorithm can control all those fine-grained attributes. Furthermore, extensive experiments on three popular text style transfer tasks show that our approach outperforms five previous state-of-the-art methods by a large margin.

References

  • Alemi et al. (2018) Alexander Alemi, Ben Poole, Ian Fischer, Joshua Dillon, Rif A Saurous, and Kevin Murphy. Fixing a broken elbo. In ICML, 2018.
  • Bowman et al. (2016) Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy Bengio. Generating sentences from a continuous space. In CoNLL, 2016.
  • Che et al. (2017) Tong Che, Yanran Li, Ruixiang Zhang, R Devon Hjelm, Wenjie Li, Yangqiu Song, and Yoshua Bengio. Maximum-likelihood augmented discrete generative adversarial networks. arXiv preprint arXiv:1702.07983, 2017.
  • Eckert and McConnell-Ginet (2013) Penelope Eckert and Sally McConnell-Ginet. Language and gender. Cambridge University Press, 2013.
  • Fabius and van Amersfoort (2015) Otto Fabius and Joost R van Amersfoort. Variational recurrent auto-encoders. In ICLR (Workshop), 2015.
  • Fu et al. (2018) Zhenxin Fu, Xiaoye Tan, Nanyun Peng, Dongyan Zhao, and Rui Yan. Style transfer in text: Exploration and evaluation. In AAAI, 2018.
  • Goodfellow et al. (2014) Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS, 2014.
  • Goyal et al. (2017) Anirudh Goyal Alias Parth Goyal, Alessandro Sordoni, Marc-Alexandre Côté, Nan Rosemary Ke, and Yoshua Bengio. Z-forcing: Training stochastic recurrent networks. In NIPS, 2017.
  • Guillaume Lample (2019) Eric Michael Smith Ludovic Denoyer Marc’Aurelio Ranzato Y-Lan Boureau Guillaume Lample, Sandeep Subramanian. Multiple attribute text rewriting. In ICLR, 2019.
  • Guo et al. (2018) Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang. Long text generation via adversarial training with leaked information. In AAAI, 2018.
  • He and McAuley (2016) Ruining He and Julian McAuley. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In WWW, 2016.
  • Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 1997.
  • Hu et al. (2017) Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P Xing. Toward controlled generation of text. In ICML, 2017.
  • John et al. (2019) Vineet John, Lili Mou, Hareesh Bahuleyan, and Olga Vechtomova. Disentangled representation learning for text style transfer. In AAAI, 2019.
  • Kim (2014) Yoon Kim. Convolutional neural networks for sentence classification. In EMNLP, 2014.
  • Kingma and Welling (2013) Diederik P Kingma and Max Welling. Auto-encoding variational bayes. In ICLR, 2013.
  • Kingma et al. (2016) Diederik P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. Improved variational inference with inverse autoregressive flow. In NIPS, 2016.
  • Kneser and Ney (1995) Reinhard Kneser and Hermann Ney. Improved backing-off for m-gram language modeling. In ICASSP, 1995.
  • Kusner and Hernández-Lobato (2016) Matt J Kusner and José Miguel Hernández-Lobato. Gans for sequences of discrete elements with the gumbel-softmax distribution. arXiv preprint arXiv:1611.04051, 2016.
  • Li et al. (2017) Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, and Dan Jurafsky. Adversarial learning for neural dialogue generation. In EMNLP, 2017.
  • Li et al. (2018) Juncen Li, Robin Jia, He He, and Percy Liang. Delete, retrieve, generate: A simple approach to sentiment and style transfer. In NAACL-HLT, 2018.
  • Lin et al. (2017) Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, and Ming-Ting Sun. Adversarial ranking for language generation. In NIPS, 2017.
  • Liu et al. (2018) Dayiheng Liu, Jie Fu, Qian Qu, and Jiancheng Lv. Bfgan: Backward and forward generative adversarial networks for lexically constrained sentence generation. arXiv preprint arXiv:1806.08097, 2018.
  • Liu et al. (2019) Dayiheng Liu, Xue Yang, Feng He, Yuanyuan Chen, and Jiancheng Lv. mu-forcing: Training variational recurrent autoencoders for text generation. arXiv preprint arXiv:1905.10072, 2019.
  • Logeswaran et al. (2018) Lajanugen Logeswaran, Honglak Lee, and Samy Bengio. Content preserving text generation with attribute controls. In NeurIPS, 2018.
  • Luo et al. (2018) Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie-Yan Liu. Neural architecture optimization. In NeurIPS, 2018.
  • Melnyk et al. (2017) Igor Melnyk, Cicero Nogueira dos Santos, Kahini Wadhawan, Inkit Padhi, and Abhishek Kumar. Improved neural text attribute transfer with non-parallel data. In NIPS (Workshop), 2017.
  • Mueller et al. (2017) Jonas Mueller, David Gifford, and Tommi Jaakkola. Sequence to better sequence: continuous revision of combinatorial structures. In ICML, 2017.
  • Och and Ney (2004) Franz Josef Och and Hermann Ney. The alignment template approach to statistical machine translation. Computational linguistics, 2004.
  • Papineni et al. (2002) Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, 2002.
  • Prabhumoye et al. (2018) Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Salakhutdinov, and Alan W Black. Style transfer through back-translation. In ACL, 2018.
  • Rezende et al. (2014) Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra.

    Stochastic backpropagation and approximate inference in deep generative models.

    In ICML, 2014.
  • Roberts et al. (2018) Adam Roberts, Jesse Engel, Colin Raffel, Curtis Hawthorne, and Douglas Eck. A hierarchical latent vector model for learning long-term structure in music. In ICML, 2018.
  • Schuster and Paliwal (1997) Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 1997.
  • Semeniuta et al. (2017) Stanislau Semeniuta, Aliaksei Severyn, and Erhardt Barth.

    A hybrid convolutional variational autoencoder for text generation.

    In EMNLP, 2017.
  • Shen et al. (2017) Tianxiao Shen, Tao Lei, Regina Barzilay, and Tommi Jaakkola. Style transfer from non-parallel text by cross-alignment. In NIPS, 2017.
  • Shen et al. (2018) Xiaoyu Shen, Hui Su, Shuzi Niu, and Vera Demberg. Improving variational encoder-decoders in dialogue generation. In AAAI, 2018.
  • Sutton et al. (2000) Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour.

    Policy gradient methods for reinforcement learning with function approximation.

    In NIPS, 2000.
  • Wintner et al. (2016) Shuly Wintner, Shachar Mirkin, Lucia Specia, Ella Rabinovich, and Raj Nath Patel. Personalized machine translation: Preserving original author traits. In EACL, 2016.
  • Yang et al. (2017) Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, and Taylor Berg-Kirkpatrick. Improved variational autoencoders for text modeling using dilated convolutions. In ICML, 2017.
  • Yang et al. (2018) Zichao Yang, Zhiting Hu, Chris Dyer, Eric P Xing, and Taylor Berg-Kirkpatrick. Unsupervised text style transfer using language models as discriminators. In NeurIPS, 2018.
  • Yu et al. (2017) Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. Seqgan: Sequence generative adversarial nets with policy gradient. In AAAI, 2017.
  • Zhang et al. (2017) Yizhe Zhang, Zhe Gan, Kai Fan, Zhi Chen, Ricardo Henao, Dinghan Shen, and Lawrence Carin. Adversarial feature matching for text generation. In ICML, 2017.
  • Zhao et al. (2017) Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In ACL, 2017.

Appendix A Hyper-parameters and Ablation Study

We study the effect of following hyper-parameters and configurations:

  1. The hyper-parameter described in Equation 12, which is the trade-off parameter to balance the content preservation and style transfer strength.

  2. The retraining of the predictors by Equation 6 and 10. We conduct an ablation study to verify the effect of the retraining.

  3. The target of the content predictor. As described in Section 3.1, we proposed two kinds of : (a) Set the to contain all the words in the original sentence (denoted as Cont-1); (b) Set the to contain only all nouns (detected by a NLTK POS tagger) in (denoted as Cont-2). Besides, we test not using the content predictor (denoted as Cont-0).

  4. The KL loss of the Variational Auto-encoder (VAE). If the KL loss is too large, the VAE will collapse into an Auto-encoder. If the KL loss drops to 0, the VAE will collapse into a plain language model. Ideally, it should be small but non-zero. Under different configurations (e.g., KL annealing, and the weighted KL term), we obtain VAEs of different KL losses and then test their performance in our scenarios.

Table 5 reports these results on Yelp sentiment transfer task with the same settings in Section 4.1. From the results we can see:

  1. When the value of is increased, the word overlap score and the Noun% score increase while the sentiment transfer accuracy decreases. It demonstrates that the can control the trade-off between attribute transfer and content preservation.

  2. After the retraining of sentiment predictor, the sentiment transfer accuracy increased from 88.3% to 93.1%. The retraining of content predictor further improves the word overlap score and the Noun% score. These results show that the retraining of the predictors by Equation 6 and 10 can further improve the performance.

  3. As expected, Cont-1 can improve the word overlap score, while Cont-2 can further improve the Noun% score and the sentiment transfer accuracy. Compared with the Cont-0, both Cont-1 and Cont-2 have significantly improved the success rate, which indicates the effectiveness of the content predictor.

  4. When the KL loss of the VAE is lower, the reconstruction error is higher. At the same time, the accuracy and the fluency are better, but the content preservation is poor. The KL term of VAE can also control the trade-off between attribute transfer and content preservation.

Settings Accuracy PPL Overlap Noun% Suc%
1. :
= 0.1 88.7 20.6 35.7 73.6 61.4
= 0.05 93.4 19.8 34.8 68.6 64.5
2. Retraining:
No Retraining 88.3 19.4 39.4 58.6 59.4
+ Retrain Sentiment Predictor 93.1 22.8 39.7 60.0 62.9
+ Retrain Content Predictor 94.1 20.6 41.6 61.5 65.4
3. type:
Cont-0 91.9 12.6 33.2 43.4 49.4
Cont-1 92.8 19.4 36.3 60.2 60.5
Cont-2 93.4 19.8 35.7 68.6 64.5
4. KL loss:
= 13.85 92.6 14.6 31.7 53.4 57.4
= 17.27 88.8 19.8 38.1 69.1 64.0
= 21.84 84.7 27.6 43.1 86.8 63.8
Table 5: Evaluation results of different hyper-parameters and configurations of the sentiment transfer task on Yelp. The notation means the higher is the better, while means the lower is the better.

Appendix B Samples of Sentiment Transfer

Some samples of the sentiment transfer task from ours and baselines on Yelp and Amazon are shown in Table 6 and Table 7, respectively.

Sentiment transfer from negative to positive (Yelp)
     Original we sit down and we got some really slow and lazy service .
     Human the service was quick and responsive .
     CrossAligned we went down and we were a good , friendly food .
     MultiDecoder we sit down and we got some really and fast food .
     DeleteAndRetrieve we got very nice place to sit down and we got some service .
     BackTranslation we got and i and it is very nice and friendly staff .
     Ours1 we sat down and got some really good service and friendly people .
     Ours2 we sat down the street and had some really nice and fast service .
     Ours3 we really sit down and the service and food were great .
     Original there was only meat and bread .
     Human there was a wide variety of meats and breads .
     CrossAligned there was amazing flavorful and .
     MultiDecoder there was only meat and bread .
     DeleteAndRetrieve meat and bread was very fresh .
     BackTranslation it was very nice and helpful .
     Ours1 the bread was fresh and the meat was tender .
     Ours2 the bread was good and the bread was fresh and plentiful .
     Ours3 the bread was fresh and very tasty .
     Original anyway , we got our coffee and will not return to this location .
     Human we got coffee and we ’ll think about going back .
     CrossAligned anyway , we got our food and will definitely return to this location .
     MultiDecoder anyway , we got our coffee and will not return to this location .
     DeleteAndRetrieve anyway , we got our coffee and would recommend it to everyone .
     BackTranslation everything in the staff is very nice and it was the best .
     Ours1 i will return to this location , and we will definitely return .
     Ours2 we will return to this location again , and the coffee was great .
     Ours3 we will definitely return , and this is our new favorite coffee place .
Sentiment transfer from positive to negative (Yelp)
     Original i love this place , the service is always great !
     Human hate this place , service was bad .
     CrossAligned i know this place , the food is just a horrible !
     MultiDecoder i love this place , the service is always great !
     DeleteAndRetrieve i did not like the homework of lasagna , not like it , .
     BackTranslation i wish i have been back , this place is a empty !
     Ours1 however , this place is the worst i have ever been to .
     Ours2 i do n’t know why i love this place , but the service is horrible .
     Ours3 i do n’t know why this place has the worst customer service ever .
     Original their pizza is the best i have ever had as well as their ranch !
     Human their pizza is the worst i have ever had as well as their ranch !
     CrossAligned their pizza is the other i have ever had as well as their onions !
     MultiDecoder their pizza is the best i have ever had as well at their job !
     DeleteAndRetrieve had their bad taste like ranch !
     BackTranslation their food is n’t the worst i ’ve ever had to go !
     Ours1 this is the worst pizza i have ever had as well as their ranch .
     Ours2 this is the worst pizza i have ever had as well as their bruchetta .
     Ours3 i have had the worst pizza i have ever had in my life as well .
     Original i will be going back and enjoying this great place !
     Human i wo n’t be going back and suffering at this terrible place !
     CrossAligned i will be going back because from the _num_ stars place !
     MultiDecoder i will be going back and often at no place !
     DeleteAndRetrieve i will be going back and will not be returning into this anymore .
     BackTranslation i will not be going back and this place is awful !
     Ours1 i will not be going back to this place for a while .
     Ours2 i will not be going back to this place for a while .
     Ours3 i wo n’t be going back to this place unless i ’m desperate .
Table 6: Samples of the sentiment transfer task from ours and baselines on Yelp. The Original denotes the input sentence, and the Human denotes the human annotated sentence. The samples of the sentiment transfer from negative to positive and positive to negative are shown in top and bottom, respectively.
Sentiment transfer from negative to positive (Amazon)
     Original ridiculous ! i had trouble getting it on with zero bubbles .
     Human great ! i had no trouble getting it on with zero bubbles .
     CrossAligned so far i have been using it for years and now .
     MultiDecoder beautiful i have to replace it with after using the _num_
     DeleteAndRetrieve they are easy to use , i had trouble getting it on with zero bubbles .
     BackTranslation flavorful ! i don t have used it to work with _num_ years .
     Ours1 i have no trouble putting bubbles on it .
     Ours2 i have had no trouble getting bubbles on it .
     Ours3 i ve had no problems with bubbles on it .
     Original i ve used it twice and it has stopped working .
     Human used it without problems .
     CrossAligned i have it s so it s just work well .
     MultiDecoder i ve used it twice and it has gave together .
     DeleteAndRetrieve i ve used it twice and it has performed well .
     BackTranslation i ve been using this for _num_ years now and it works great .
     Ours1 i ve used it several times and it works great .
     Ours2 i ve used it several times and it has worked flawlessly .
     Ours3 i ve used it for several months now and it has been working great .
     Original i ve used these a few times and broke them very easily .
     Human i ve used these a few times and loved them .
     CrossAligned i ve had this for a few months and it s fine .
     MultiDecoder i ve used these a few times and use the iphone very quickly .
     DeleteAndRetrieve i ve used these a few times and broke them very easily ! .
     BackTranslation i ve had this case for _num_ years and it works great .
     Ours1 i ve used them a few times and they are very sturdy .
     Ours2 i ve used them several times a week and they are very sturdy .
     Ours3 i ve used these a few times and they are very sturdy .
Sentiment transfer from positive to negative (Amazon)
     Original this product does what it is suppose to do .
     Human this product does not do what it is supposed to do .
     CrossAligned this product isn t work and i have used .
     MultiDecoder this product does what it is supposed to do .
     DeleteAndRetrieve this product did not do what it was suppose to do .
     BackTranslation this product metropolis what it s like .
     Ours1 this product does not do what it claims to do .
     Ours2 this product does not do what it claims to do .
     Ours3 this product does not do what it claims to do .
     Original i would recommend to anyone who wants a pda .
     Human i would not recommend this to anyone who wants a pda .
     CrossAligned i would not recommend it to be a refund .
     MultiDecoder i would recommend to anyone who has it into .
     DeleteAndRetrieve i would not recommend this to anyone who wants a sensitive pda .
     BackTranslation i wish i would give them a lot of them .
     Ours1 i would not recommend this product to anyone .
     Ours2 i would not recommend this to anyone who wants a <UNK> .
     Ours3 i would not recommend this to anyone who wants a <UNK> .
     Original i have been extremely happy with my purchase .
     Human upset at purchase from the start .
     CrossAligned i have been using them for my hair .
     MultiDecoder i have been extremely happy with my review .
     DeleteAndRetrieve i have been extremely disappointed with this purchase .
     BackTranslation i was very disappointed with my phone .
     Ours1 i am very disappointed with this purchase and would not purchase again .
     Ours2 i have been extremely disappointed with my purchase .
     Ours3 i am very disappointed with this purchase .
Table 7: Samples of the sentiment transfer task from ours and baselines on Amazon. The Original denotes the input sentence, and the Human denotes the human annotated sentence. The samples of the sentiment transfer from negative to positive and positive to negative are shown in top and bottom, respectively.

Appendix C Samples of Gender Style Transfer

Table 8 shows some samples of the gender style transfer task from ours and the strong baseline.

Gender style transfer from male to female
     Original i wish there is more than 0 stats to give you .
     BackTranslation i think there ’ s than 0 stars to see you .
     Ours1 i wish i could give more stars .
     Ours2 i wish there would give more stars .
     Ours3 i wish i could give more stars .
     Original good vibe , good drinks and prices and unique decoration .
     BackTranslation good service , good service , and the service and décoration .
     Ours1 overall , the drinks were really good .
     Ours2 overall , the drinks are really good and unique .
     Ours3 the drinks are good , and the decor is cute .
     Original the food was n’t anything outstanding to justify the price .
     BackTranslation the food was kind of a good time to try the price .
     Ours1 i ca n’t wait to go back and the food was n’t anything special .
     Ours2 the food was n’t anything special .
     Ours3 the food was n’t anything special .
     Original the cost was more for the size than the quality .
     BackTranslation the service itself was very good for the price that ’ s hotels .
     Ours1 the portion size was more than enough for me .
     Ours2 the portion size was more than enough for the size .
     Ours3 the portion size was more than $ _num_ for the size of the portion .
Gender style transfer from female to male
     Original we went here for my fiance ’ s birthday .
     BackTranslation we went here for my wife ’ s anniversaire .
     Ours1 went here for my wife ’ s birthday .
     Ours2 went here for my wife ’ s birthday .
     Ours3 went here for my wife ’ s birthday .
     Original they always take such good care of me .
     BackTranslation they always do a good job .
     Ours1 they do a good job of taking care of you .
     Ours2 they always take care of you .
     Ours3 they do a good job of taking care of you .
     Original if you do come for breakfast get a croissant .
     BackTranslation if you are looking for lunch , has a stems .
     Ours1 if you come here for breakfast , you get a breakfast sandwich .
     Ours2 do n’t come here if you want a breakfast sandwich .
     Ours3 breakfast croissant is a must if you come here for breakfast .
     Original the only thing worth mentioning was their dessert .
     BackTranslation only compared to say was their service .
     Ours1 the only thing worth mentioning is the deserts .
     Ours2 the only thing worth mentioning is the deserts .
     Ours3 the only thing worth mentioning is the dessert .
Table 8: Samples of the gender style transfer task from ours and baselines. The Original denotes the input sentence. The samples of the gender style transfer from male to female and female to male are shown in top and bottom, respectively.

Appendix D Samples of Multiple Fine-Grained Attributes Control

The samples of multiple fine-grained attributes control are shown in Table 9.

Multiple fine-grained attributes control (from negative to positive)
     Original i was very disappointed with this place .
     Keywords i love this place .
     Sentiment + Keywords i love this place too .
     Length i love this place , and i ’m so glad i went to the house .
     Sentiment + Length i was very disappointed with this place , and i was not impressed with it .
     Keywords + Length i was very impressed with this place and this place was very good .
     Sentiment + Keywords + Length i was very impressed with .
     Length very disappointed overall .
     Sentiment + Length love this .
     Keywords + Length i was very impressed with this place and love this place .
     Sentiment + Keywords + Length i love this place .
     Original at this location the service was terrible .
     Keywords the location at location was very convenient .
     Sentiment + Keywords the location is convenient and convenient .
     Length the location at this location was convenient and the service was horrible .
     Sentiment + Length this was the first time i went to this location and the service was terrible .
     Keywords + Length the service at this location was great and the food was very good .
     Sentiment + Keywords + Length the service at this location .
     Length terrible customer service .
     Sentiment + Length this location is convenient .
     Keywords + Length the location at this location is great and the location is very convenient .
     Sentiment + Keywords + Length location is convenient .
     Original i ’ll keep looking for a different salon .
     Keywords i love looking for this nail salon .
     Sentiment + Keywords i love this nail salon for sure .
     Length i love this place , and i ’ll be looking for a new nail .
     Sentiment + Length i ’ll be looking for a different nail salon , and i do n’t know .
     Keywords + Length i have been to this salon for a couple of the day , and it ’s always the same thing i have ever made .
     Sentiment + Keywords + Length love this salon .
     Length definitely a salon .
     Sentiment + Length i love this nail salon .
     Keywords + Length i love this place , and i ’ll be looking for a new nail .
     Sentiment + Keywords + Length i love this nail salon .
Multiple fine-grained attributes control (from positive to negative)
     Original the best mexican food in the phoenix area .
     Keywords this is the best mexican restaurant in the area .
     Sentiment + Keywords this was the worst chinese restaurant in the phoenix area .
     Length this is the best mexican food i have had in the area and the area .
     Sentiment + Length this was the worst chinese food i have had in the phoenix area in phoenix .
     Keywords + Length this is the best mexican food in the area and the best restaurant in phoenix .
     Sentiment + Keywords + Length this was the worst chinese restaurant i have ever been to in the entire area .
     Length best mexican food .
     Sentiment + Length the worst food in phoenix .
     Keywords + Length best mexican restaurant in phoenix .
     Sentiment + Keywords + Length the worst restaurant in the area .
     Original thank you amanda , i will be back !
     Keywords thanks again , thank you angela !
     Sentiment + Keywords no thanks , i will not be back .
     Length if you are in the mood , i will definitely be taking care of you .
     Sentiment + Length if you want to be treated rudely , i will be taking care of you .
     Keywords + Length thanks to steven , i will be back , thank you for my next experience !
     Sentiment + Keywords + Length if i asked him , i will be taking my car elsewhere , no thanks .
     Length thank you !
     Sentiment + Length i will not be back .
     Keywords + Length thanks again , thank you !
     Sentiment + Keywords + Length no thanks , thank you !
     Original service was great and food was even better .
     Keywords terrible customer service and even better customer service .
     Sentiment + Keywords the customer service was terrible even worse than it was .
     Length the food was great , and the service was even better than i remembered it .
     Sentiment + Length the food was terrible and the service was even worse than it was even worse .
     Keywords + Length the customer service was terrible and the food was even worse than i remembered it .
     Sentiment + Keywords + Length the customer service was terrible even though it was n’t even worse than before .
     Length service was great .
     Sentiment + Length the service was even worse .
     Keywords + Length customer service was even better .
     Sentiment + Keywords + Length even worse customer service was terrible .
Table 9: Samples of multiple fine-grained attributes control from ours. We bold the pre-defined keyword.