Different corpus may present different language styles, featured by variations in attitude, tense, word choice, et cetera. As human beings, we always have an intuitive perception of style differences in texts. In the literature of linguistics, there also developed a number of mature theories for characterizing style phenomena in our daily lives (Bell, 1984; Coupland, 2007; Ray, 2014)How is style information encoded by learning models?
In this paper, we share our novel observations for this question in a specified version, that is, in what approach Sequence-to-Sequence (seq2seq; Sutskever et al. 2014
), a prestigious neural network architecture widely used in NLP and representation learning(Bahdanau et al., 2014; Li et al., 2015; Kiros et al., 2015), encodes language styles.
In our preliminary studies, we applied a typical seq2seq model as an autoencoder to learn semantic vectors for sentences from Yelp review dataset111https://www.yelp.com/dataset/challenge and we strikingly made the following observation: After calculating respectively the covariance matrices of the semantic vectors of reviews with different attitude polarity and intensity, we found their second eigenvectors
, i.e. eigenvectors with the second largest eigenvalues, were roughly grouped in two parts according to their polarity and meanwhile showed slight difference according to their intensity, as in the colored part of Fig.1, while the first eigenvectors
illustrated in gray formed a single cluster, as if they captured certain common attributes of the Yelp corpus, e.g. casual word choices. This phenomenon suggests the covariance matrices have probably encoded the language style in an informative way. Based on this observation, we provide the notion of style matrix and investigate a number of far-reaching implications brought by style matrix in the remainder of this paper.
Conjecture 1 (Style Matrix).
The style of a corpus is encoded in the covariance matrix of its semantic vectors, which is called style matrix.
For the best of our knowledge, our research question and conjecture are quite novel and there barely exist any relevant works before. The most related works are probably the recent studies on text style transfer, a task which was first investigated by Shen et al. (2017) for converting a given corpus in one style to another. Although most of them have not discussed style at a fundamental level, we mainly identify three related perspectives in existing style transfer methods.
Discrete Label: A major proportion of the state-of-the-art style transfer methods simply view the corpus style as discrete labels. For instance, the style of positive and negative reviews in Yelp dataset are respectively assigned with binary labels (Shen et al., 2017; Hu et al., 2017; Chen et al., 2018).
In principle, our main conjecture is better compatible with linguistic aspects of style phenomenon than the aforementioned perspectives, especially in the following aspects.
Style is a statistical phenomenon. According to the variationist’s view in sociolinguistics Coupland (2007), style emerges from variation of language usages and is always a global phenomenon rather than a property of single sentence. In our conjecture, style matrix by definition reflects the covariance of the corpus, while Perspective C could only characterize the sentence-level style.
Style is inherent in semantics. As Ray (2014) suggests, expression often helps to form meaning. In our conjecture, style matrix is an explicit function of semantic vectors, while Perspective B improperly assumed style embedding’s independence on semantic.
Style is multi-modal. Usually, we recognize style in texts from many different aspects Bell (1984). For instance, I am very unhappy with this place is negative in attitude and meanwhile present in tense. Moreover, one can also recognize slight differences in style intensity for both negative sentences, i.e. with/without very in the example above. With experiments, we show style matrix is able to distinguish various intensity level of style (§4.2.1) and capture multiple styles in one corpus (§4.2.2), while Perspective A is impotent to characterize these delicate style differences by discrete labels.
In practice, based on the notion of style matrix, we propose a novel algorithm called Neutralization-Stylization (NS) for unpaired text style transfer. Given style matrices of source and target corpora obtained from a pre-trained seq2seq autoencoder, our algorithm works in a fully learning-free manner by first preparing a pair of matrix transform operators from the style matrices. After the preparation, it simply applies these operators to the style matrix of given corpus to accomplish text style transfer on the fly. By introducing additional style information as supervision on the learning process of the seq2seq autoencoder, we observe NS algorithm can achieve comparable performance with the state-of-the-art style transfer methods on each standard metric. Moreover, the flexibility of our method is further demonstrated by its ability to control the style of unlabeled sentences from other domains, i.e. out-of-domain text style transfer, which we propose as a much challenging task to foster future researches.
In summary, our contributions are as follows:
We present the notion of style matrix as an informative delegate to language style and explains for the first time how seq2seq models encode language styles (§2).
We introduce the challenging out-of-domain text style transfer task to further prove the flexibility of our proposed method (§4.4).
2 Style Matrix
2.1 A General Framework for Style Matrix Extraction
Given a corpus , where is a sequence of tokens, we wonder whether there exists an explicit way to extract the global style of with no other external knowledge. Inspired from the variationist’s approach to language style in the context of sociolinguistic (Coupland, 2007), we suggest exploiting the second-order statistics of semantics, specifically the covariance matrix, as an informative representation of the corpus style (Conjecture 1
). Somewhat coincidentally, a similar viewpoint on visual style has been investigated in the computer vision community recently(Gatys et al., 2015).
Due to the discrete essence of language, to compute the covariance matrix is not directly applicable to raw representations such as those in one-hot scheme. In order to fulfill the statement in our main conjecture, we require the semantic vector to be both distributed and nearly lossless. The former property requires the semantic of the original sentence can be compressed into a latent vector, while the latter requires the original sentence can be near-optimally reconstructed from the semantic vector alone.
Formally, we first convert into distributed representations with a mapping (i.e. encoder) from to , a -dimensional real-valued vector space. In order to guarantee is lossless, we further require the existence of a reverse mapping (i.e. decoder) from which satisfies , the identity mapping on the corpus. Once these conditions satisfied, we call the distributed representations , which consists of s.t. , the semantic vectors of corpus . As a slight abuse of notation, we also use to represent the semantic vecotrs in matrix form, i.e. . Based on these notations, we provide the formal counterpart to Conjecture 1 as follows.
Definition 1 (Style Matrix).
Given corpus with a semantic encoder satisfying the requirements above, we define the style matrix as
where denotes after being centered, i.e. and .
In recent studies of sentence embedding (e.g. Le and Mikolov 2014; Conneau et al. 2017; Pagliardini et al. 2017), there indeed exist various existing choices for implementing encoder . However, as most of them do not have an explicit notion of the decoder, the extracted style matrix would therefore not be able to be further utilized in downstream style transfer tasks. Therefore, in the next section, we propose to leverage the power of seq2seq paradigm (Sutskever et al., 2014) as a practical tool for extracting highly informative style matrix and meanwhile, facilitates style transfer tasks with the simultaneously trained decoder module. A detailed implementation is provided below.
2.2 Case Study: Seq2seq for Style Matrix Extraction
As an overview, we implement the encoder in Definition 1 with the encoder module of a seq2seq model, while its decoder module learns alongside under the reconstruction loss to guarantee the original semantic is largely preserved in the obtained semantic vectors . Given a sentence with each token from a vocabulary , we propose the learning process for style matrix extraction below.
where contains all the hidden states calculated by the GRU encoder.
Next, viewing the last hidden state of the encoder as the semantic vector , we further require the decoder can reconstruct based on token by token with a GRU (denoted as ). Formally, at each step , takes the generated token and the previous hidden state as input to calculate the current state by
where the initial state is set as .
Subsequently, with a linear projection layer followed by a softmax transformation, the distribution of the next token over the vocabulary is calculated as
where is a learnable matrix in .
By convention of unsupervised learning(LeCun et al., 2015), we set the reconstruction objective as the categorical cross entropy between the input sequence and the distribution of the reconstructed sequence . In practice, we further apply the scheduled sampling technique to accelerate the aforementioned learning process (Bengio et al., 2015).
It is worth to notice, in our implementation of seq2seq for autoencoding, we have intentionally avoided the usage of attention mechanism (Luong et al., 2015). It is mainly because, with the attention mechanism, information flow from encoder to decoder is not limited to the semantic vector . For example, the reconstruction process is otherwise also dependent on the context vector. Therefore, although attention mechanism can bring optimal reconstruction loss even with small hidden state size, it may cause potential semantic loss and therefore compromise the quality of the extracted style matrix.
As a final remark, we demonstrate our method above with GRU modules only for the sake of concreteness. Besides GRU, there are various available recurrent architectures for implementing , such as vanilla recurrent unit (Rumelhart et al., 1985)
, Long Short-Term Memory network (LSTM;Hochreiter and Schmidhuber (1997)) and their bidirectional or stacked variants (Jurafsky, 2000). In experiments, we also report results with several typical architectures as a comprehensive self-comparison.
3 Style Transfer with Style Matrix
In this section, we propose a novel algorithm called Neuralization-Stylization (NS) for unpaired text style transfer by directly aligning the style matrix of one corpus to the other with a pair of plug-and-play matrix operations. To achieve competitive performance as the state-of-the-art style transfer methods, we further augment the unsupervised style matrix extraction process in Section 2.2 by introducing human-defined style information as external supervision.
3.1 Neutralization-Stylization algorithm
As a covariance matrix in essence, style matrix can be factorized into the following form due to its positive semi-definiteness
where is a diagonal matrix consisting of its eigenvalues and
is an orthogonal matrix formed by its eigenvectors(Meyer, 2000) .
Given two corpora and and a seq2seq autoencoder pretrained on as a larger corpus, we calculate Eq. 1 respectively on to obtain the style matrices . Using eigenvalue decomposition in Eq. 5, we next introduce a pair of Neutralization and Stylization operators, which can be easily used for on-the-fly text style transfer in a plug-and-play manner. Note both operators are defined on a set of semantic vectors rather than a single embedding, which highly corresponds to the statistical essence of language style (Coupland, 2007).
3.1.1 Plug-and-Play Style Transfer Operators
Neutralization. Neutralization operator is used to remove the style characteristic of corpus from a set of semantic vectors . Formally, in the spirit of Zero-phase Component Analysis (ZCA) (Bell and Sejnowski, 1997), neutralization operator is defined as
An intuitive way to understand how it works is by replacing directly with the semantic vectors of . It is easy to check: has its style matrix as , which means the dimensions of semantics become uncorrelated after neutralization.
Stylization. Stylization transformation is used to add the style characteristic of corpus to a set of semantic vectors by reestabilishing the correlation among dimensions of semantics, which, with inspirations from Hossain (2016), is defined as
Similarly, by stylizing a neutral set of semantic vectors (i.e. ), we can easily check has the same style matrix as that of corpus , which hence demonstrates the properness of .
3.1.2 On-the-Fly Text Style Transfer
With the well-defined neutralization and stylization operators, our proposed learning-free NS algorithm works straightforwardly by: (1) encoding with ; (2) applying prepared operators successively; (3) decoding the semantic vector with . Formally, the target sentence is calculated as
Moreover, thanks to the flexibility of style matrix perspective and NS algorithm, we can even conduct out-of-domain style transfer, where the input sentence not necessarily comes from corpus or has style labels. For details, we present an interesting case study on out-of-domain style transfer between Yelp and Amazon datasets in Section 4.4.
3.2 Incorporate Human-Defined Style Label
In practice, we notice the performance of NS algorithm with raw style matrix is not competitive with the state-of-the-art methods specified on this task. We speculate the main reason lies in: Style matrix is highly informative and probably incorporates even the most delicate aspect of style of the underlying corpus. Therefore, its unsatisfactory performance on style transfer task implies the corpus actually has other latent attributes of style besides the human-defined ones, as we have illustrated with the Yelp example in Section 1 by its clustered first eigenvectors (gray arrows in Fig. 1).
To enhance the quality of style transfer, we suggest to augment the style matrix extraction process with human-defined attribute (e.g. attitude). Concretely, we propose to train the encoder
of the seq2seq model in a semi-supervised way by adding a nonlinear binary classifieron the semantic space, which provides supervision signal simultaneously with the original unsupervised reconstruction process. Formally, given semantic vector , we define the classifier as
where is the trainable parameter and is the sigmoid activation.
Noticeably, under both scenarios, our text style transfer algorithm is learning-free because: we only need to pretrain a seq2seq model, either in fully unsupervised or semi-supervised way, to obtain a pair of encoder and decoder and prepare the operators with several matrix operations. Without time-consuming adversarial training (e.g. Shen et al. (2017)), our augmented method achieved competitive transfer performance on each standard metric (§4.3).
4 Experiments222Code is provided at https://bit.ly/2QgEUNE
4.1 Overall Settings
Datasets. We used the following two standard benchmark datasets for empirical studies.
Yelp: The Yelp dataset collected the reviews to restaurants on Yelp. Each sentence is associated with an integer rating ranging from to , where a higher score implies the more positive of the corresponding review’s attitude and vice versa. We treated the attitude of reviews with ratings above as positive while those below as negative.
Amazon: The Amazon dataset contains the product reviews on Amazon. Each sentence is originally labeled with positive or negative attitudes (He and McAuley, 2016).
With an automatic tense analysis tool (Ramm et al., 2017), we annotated the tense attribute for sentences in Yelp and Amazon as an additional style factor. We filtered out the sentences which were not in past and present tense and split each processed dataset into train, validation and test sets. For statistics, please refer to Appendix A.
Evaluation Metric. We evaluated the performance of style transfer on the following two standard metrics.
Accuracy (Acc.): In order to evaluate whether the transferred sentences have the desired style, we followed the evaluation method in Shen et al. (2017) by pretraining a style classifier on the training set and utilizing its classification accuracy on the transferred sentences as a metric. Specifically, we used the TextCNN model (Kim, 2014) as a style classifier.
BLEU: In order to evaluate the quality of content preservation, we used the BLEU score Papineni et al. (2002) between the generated and the source sentences as a measure. Intuitively, a higher BLEU score primarily indicates the model has a stronger ability to preserve content by copying style-neutral words from the source sentence.
To evaluate the overall performance of style transfer quality, we also calculated the geometric mean (i.e.G-Score) and arithmetic mean (i.e. Mean) of Acc. and BLEU metrics.
Implementation Details. We embedded words into distributed representations (with dimension ) using CBOW Mikolov et al. (2013) and froze the word embeddings during the training process. We implemented the seq2seq model with (1) GRU of hidden units, (2) LSTM of hidden units and (3) bi-directional GRU of both forward and backward hidden units. For (2), we concatenated the final hidden state and cell state to form the -dimensional semantic vector, while for (3), we concatenated the forward and backward final hidden states. We trained each seq2seq model on the training set with Adam optimizer Kingma and Ba (2014) and performed style transfer on the validation set. We set the weight of reconstruction loss and classification loss as 10:1. As observed in Section 4.3, the informativeness of style matrix was insensitive to different choices of recurrent architectures and hence we only report the results of GRU implementation in other parts.
4.2 Explore the Styles of Yelp
As is discussed in Section 1, the notion of style matrix conforms to the linguistic aspects that style is innate in semantics and is multi-modal. To demonstrate style matrix can indeed capture these delicate style phenomena, we first mixed up all the reviews on Yelp with different ratings and trained a seq2seq model with reconstruction loss only. We then divided the corpus into several sub-corpora with well-designed criteria. Finally, we performed text style transfer with operators prepared respectively with these pairs of sub-corpora. Detailed results and analyses are followed in each part.
4.2.1 Style Intensity
We collected four corpora which contained sentences respectively with rating , , and (denoted as R1, R2, R4, R5) and discarded the neutral sentences with rating . The former two sub-corpora have the same polarity of attitude (i.e. negative) but with different intensity and so as the latter two. For visualization, Fig. 2 plots the first eigenvectors of each style matrix, which shows a recognizable color gradience from the most negative corpus (with rating ) to the most positive corpus (with rating ).
Subsequently, we constructed three sets of operators respectively from stylistic pairs (R1, R5), (R1 R2, R4 R5) and (R2, R4), in the decreasing order of style contrast level. We performed style transfer on the same validation set with the three sets of prepared operators. The results are reported in Table 1. As we can see, the style transfer quality of each set of operators were positively related to the degree of style contrast and we suggest this phenomenon as an implicit validation for the informativeness of style matrix on capturing slight difference in style intensity.
4.2.2 Multiple Styles
Based on the attitude and tense annotations on Yelp, we partitioned the original corpus into two pairs of sub-corpora, namely the attitude pair (positive, negative) and the tense pair (present, past). Correspondingly, we calculated attitude (tense) transform operators respectively on each pair and applied the prepared operators to transfer the target attribute with the other style attribute fixed. We report the transfer performance of NS algorithm in Table 3, which empirically proved style matrix can simultaneously capture multiple style attributes.
4.3 Unpaired Text Style Transfer
In this part, we compared the performance of NS algorithm with the state-of-the-art methods on Yelp and Amazon datasets. We trained a seq2seq model in the semi-supervised way as described in Section 3.2 and transferred attitude of sentences with NS algorithm. We chose the following representative state-of-the-art style transfer methods as baselines.
Cross-Aligned: This method assumes a shared latent content distribution across the corpora with different styles and leverages refined alignment of latent representations to perform style transfer (Shen et al., 2017) .
Style-Embedding: This method learns separate content representations and style representations using adversarial networks. With the style information embed into distributed vector representations, one single decoder is trained for different corpora (Fu et al., 2018).
As observed in Sec. 4.2.1, the transform operators have a stronger transfer capability when generated from a pair of corpora with higher style contrast, which inspires us to further enhance the performance of NS algorithm by removing sentences with low confidence judged by the simultaneously trained style classifier. Fig. 3 plots the model performance on different metrics over drop rates ranging from to
with a fixed stride.
As we can see, the increase in drop rate caused an increase of Acc. and decrease of BLEU score. We speculate it is inevitable due to the tight interdependence between style and semantics. The result at drop rate provides further evidence on this phenomenon, that is, to change the style of the validation set to a corpus with extreme style feature would largely change their semantics.
Since the trade-off between the transfer ability and content preservation can be controlled, it is hard to select one balanced point to fully characterize the performance of our method. As a complement, we suggest to use Mean as an overall performance measure, which is more stable than G-Score as observed in Fig. 3. Table 2 shows the performance of our methods with different recurrent architectures and baselines. As we can see, our method achieved comparable performance with two baselines while averagely outperformed them on Amazon, the benchmark with a larger vocabulary size. For an illustrative comparison, we further provide some generated samples from each method in Appendix A.
4.4 Out-of-Domain Style Transfer
In the final part, we propose out-of-domain style transfer as a much challenging task for text style transfer, where, given a corpus with style labels, the style transfer models are required to control the style of unlabeled sentences coming from out-of-domain corpora. For validation of NS algorithm’s performance on this task, we use the Yelp with attitude labels only and Amazon with tense labels only to control their style on the other pair of attributes which is not observed by them. In other words, we would transfer tenses of sentences in Yelp with the operators prepared from Amazon and vice versa.
In this scenario, we only need a slight modification on our proposed method in Sec. 3, that is, to train the seq2seq model on Yelp Amazon with two style classifiers, namely attitude classifier on Yelp and tense classifier on Amazon. After preparing the style transform operators on the other domain, it is straightforward to out-of-domain style transfer. The results are reported in Table 5. It is worth to notice, even though the validation set is unlabeled in the style attribute we want to transfer, our NS algorithm can still achieve superior performance in both cases, which further validated the flexibility of style matrix perspective and the effectiveness of NS algorithm. We also provide some illustrative results in Table 4. Noticeably, the capability of out-of-domain style transfer allows us to leverage several corpus annotated with single style attributes for controlling multiple styles on each corpus.
5 Related work
Unparalled Text Style Transfer. A major proportion of works proposed to learn the style-independent semantic representations of sentences for downstream transfer tasks (Fu et al., 2018; Shen et al., 2017; Hu et al., 2017; Chen et al., 2018). These works minimized reconstruction loss of a variational autoencoder (Kingma and Welling, 2013)
to compress the sentences and align the distributions of these vectors by adversarial training. Some other works utilized heuristic transformation to accomplish style transfer by explicitly dividing the sentence into semantic words and style words(Li et al., 2018; Xu et al., 2018). Essentially different from these previous works, our work focuses on studying how seq2seq models perceive language styles and the competitive performance of our proposed style transfer algorithm is therefore better to be considered as an implicit justification to our style matrix view on language style.
showed the Gram matrices of the feature maps extracted by a pre-trained convolution neural network are able to capture the visual style of an image, which was immediately followed by numerous works have been developed to transfer the style by matching the generated Gram matrices (e.g.Ulyanov et al. 2016, 2017; Johnson et al. 2016; Chen et al. 2017) and Li et al. (2017) theoretically proves that it’s equivalent to minimize the maximum mean discrepancy of two distributions.
In this paper, we have investigated the style matrix encoded by seq2seq models as an informative delegate to language style. The notion of style matrix conforms well to human experiences and existing linguistic theories on language style. In practice, we have also proposed NS algorithm as a plug-and-play solution to unpaired text style transfer which achieved competitive transfer quality with the state-of-the-art methods and meanwhile showed superior flexibility in various use cases. In the future, we plan to discuss how the quality of semantic vectors impacts the informativeness of style matrix and study what is encoded in higher-order statistics of semantic vectors.
- Bahdanau et al. (2014) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
- Bell (1984) Allan Bell. 1984. Language style as audience design. Language in society, 13(2):145–204.
- Bell and Sejnowski (1997) Anthony J Bell and Terrence J Sejnowski. 1997. Edges are the’independent components’ of natural scenes. In Advances in neural information processing systems, pages 831–837.
- Bengio et al. (2015) Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. In Advances in Neural Information Processing Systems, pages 1171–1179.
Chen et al. (2017)
Dongdong Chen, Lu Yuan, Jing Liao, Nenghai Yu, and Gang Hua. 2017.
Stylebank: An explicit representation for neural image style
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1897–1906.
Chen et al. (2018)
Liqun Chen, Shuyang Dai, Chenyang Tao, Haichao Zhang, Zhe Gan, Dinghan Shen,
Yizhe Zhang, Guoyin Wang, Ruiyi Zhang, and Lawrence Carin. 2018.
Adversarial text generation via feature-mover’s distance.In Advances in Neural Information Processing Systems, pages 4666–4677.
- Cho et al. (2014) Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
- Conneau et al. (2017) Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, and Antoine Bordes. 2017. Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364.
- Coupland (2007) Nikolas Coupland. 2007. Style: Language variation and identity. Cambridge University Press.
Fu et al. (2018)
Zhenxin Fu, Xiaoye Tan, Nanyun Peng, Dongyan Zhao, and Rui Yan. 2018.
Style transfer in text: Exploration and evaluation.
Thirty-Second AAAI Conference on Artificial Intelligence.
- Gatys et al. (2015) Leon Gatys, Alexander S Ecker, and Matthias Bethge. 2015. Texture synthesis using convolutional neural networks. In Advances in neural information processing systems, pages 262–270.
- Gatys et al. (2016) Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423.
- He and McAuley (2016) Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on world wide web, pages 507–517. International World Wide Web Conferences Steering Committee.
- Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
M Hossain. 2016.
Whitening and coloring transforms for multivariate gaussian random variables.Project Rhea.
- Hu et al. (2017) Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P Xing. 2017. Toward controlled generation of text. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1587–1596. JMLR. org.
Johnson et al. (2016)
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016.
Perceptual losses for real-time style transfer and super-resolution.In European conference on computer vision, pages 694–711. Springer.
- Jurafsky (2000) Dan Jurafsky. 2000. Speech & language processing. Pearson Education India.
- Kim (2014) Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.
- Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Kingma and Welling (2013) Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
- Kiros et al. (2015) Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. In Advances in neural information processing systems, pages 3294–3302.
- Le and Mikolov (2014) Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International conference on machine learning, pages 1188–1196.
- LeCun et al. (2015) Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature, 521(7553):436.
- Li et al. (2015) Jiwei Li, Minh-Thang Luong, and Dan Jurafsky. 2015. A hierarchical neural autoencoder for paragraphs and documents. arXiv preprint arXiv:1506.01057.
- Li et al. (2018) Juncen Li, Robin Jia, He He, and Percy Liang. 2018. Delete, retrieve, generate: A simple approach to sentiment and style transfer. arXiv preprint arXiv:1804.06437.
- Li et al. (2017) Yanghao Li, Naiyan Wang, Jiaying Liu, and Xiaodi Hou. 2017. Demystifying neural style transfer. arXiv preprint arXiv:1701.01036.
- Luong et al. (2015) Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
- Meyer (2000) Carl D Meyer. 2000. Matrix analysis and applied linear algebra, volume 71. Siam.
- Mikolov et al. (2013) Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119.
- Pagliardini et al. (2017) Matteo Pagliardini, Prakhar Gupta, and Martin Jaggi. 2017. Unsupervised learning of sentence embeddings using compositional n-gram features. arXiv preprint arXiv:1703.02507.
- Papineni et al. (2002) Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318. Association for Computational Linguistics.
- Ramm et al. (2017) Anita Ramm, Sharid Loáiciga, Annemarie Friedrich, and Alexander Fraser. 2017. Annotating tense, mood and voice for english, french and german. Proceedings of ACL 2017, System Demonstrations, pages 1–6.
- Ray (2014) Brian Ray. 2014. Style: An Introduction to History, Theory, Research, and Pedagogy. Parlor Press.
- Rumelhart et al. (1985) David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1985. Learning internal representations by error propagation. Technical report, California Univ San Diego La Jolla Inst for Cognitive Science.
- Shen et al. (2017) Tianxiao Shen, Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2017. Style transfer from non-parallel text by cross-alignment. In Advances in neural information processing systems, pages 6830–6841.
- Sutskever et al. (2014) Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112.
- Ulyanov et al. (2016) Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Victor S Lempitsky. 2016. Texture networks: Feed-forward synthesis of textures and stylized images. In ICML, volume 1, page 4.
- Ulyanov et al. (2017) Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2017. Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6924–6932.
- Xu et al. (2018) Jingjing Xu, Xu Sun, Qi Zeng, Xuancheng Ren, Xiaodong Zhang, Houfeng Wang, and Wenjie Li. 2018. Unpaired sentiment-to-sentiment translation: A cycled reinforcement learning approach. arXiv preprint arXiv:1805.05181.
Appendix A Omitted Experimental Details
a.1 Dataset Statistics
We provide the statistics of Amazon and Yelp datasets we used in experiments in the following table.
a.2 Procedures to Produce Fig. 1
We calculated four style matrices to the corpora associated with different ratings with the style matrix extraction method introduced in Sec.4.3.1 to visualize their similarities and differences of styles the style matrix captured. We picked their first and second eigenvectors corresponding to the first two eigenvalues and applied dimension reduction to them with Multi-Dimensional Scaling (MDS) to the 2-D plane.
a.3 Sampled Sentences with Different Style Transfer Methods
|From negative to postitive (Yelp)|
|Source||the food tasted awful .|
|Cross Aligned||the food is amazing .|
|Style Embedding||the food tasted awful .|
|Ours||the food tasted amazing .|
|Source||i love the food … however service here is horrible .|
|Cross Aligned||i love the food here is great service great .|
|Style Embedding||i love the food … however service here is horrible .|
|Ours||i love the food , service here is great .|
|Source||customer service is horrible , and their prices are well above internet pricing .|
|Cross Aligned||great service , and prices are great , well quality their people .|
|Style Embedding||customer service is horrible , and their prices are well above internet pricing .|
|Ours||customer service is excellent and their prices are nice internet pricing .|
|From positive to negative (Yelp)|
|Source||lol , we all love love love this deli .|
|Cross Aligned||then , we love , but i love this salon .|
|Style Embedding||lol , we all love love love this deli .|
|Ours||lol , everyone really do n’t love this deli .|
|Source||one of the best service experiences i ’ve ever had .|
|Cross Aligned||one of the time i would i had ever had to .|
|Style Embedding||one of the best service experiences i ’ve ever had .|
|Ours||one of the worst service experiences i ’ve ever had .|
|Source||would definitely recommend this place for anyone looking for a good sandwich .|
|Cross Aligned||would not recommend this place for a good for _num_ for a food .|
|Style Embedding||would definitely recommend this place for anyone looking for a good sandwich .|
|Ours||would not recommend this place for anyone looking for a good sandwich .|
|From negative to postitive (Amazon)|
|Source||there are so many expensive natural products out there .|
|Cross Aligned||there are more than other than there are there .|
|Style Embedding||there are so many expensive natural products out there .|
|Ours||there are so many natural products out there .|
|Source||toaster looks better but performs far worse than $ ones i ve had .|
|Cross Aligned||the price works great as far as far i have ever ordered .|
|Style Embedding||toaster looks better but performs much similar awesome if i were this phone .|
|Ours||toaster looks better but far better than $ ones i ve had .|
|From positive to negative (Amazon)|
|Source||the product was delivered on the agreed date .|
|Cross Aligned||the product was was on the same problem .|
|Style Embedding||the product was delivered on the page said .|
|Ours||the product was delivered on the expiration date .|
|Source||i bought a second one with the same wonderful results .|
|Cross Aligned||i bought a replacement one of the same unk problem .|
|Style Embedding||i bought a second one with the same wonderful results .|
|Ours||i bought a second one with the same results .|
a.4 More Samples by NS Algorithm on Out-of-Domain Style Transfer
|Yelp on Tense|
|Source||staff are nice and friendly .|
|Past||staff were nice and friendly .|
|Pres||staff are nice and friendly .|
|Source||i never realized the beauty of the desert until i moved here !|
|Past||i never realized the beauty of the desert until i moved here !|
|Pres||i never assume the beauty of the desert i ’m coming here !|
|Amazon on Attitude|
|Source||yep , i thought $ was a pretty good price .|
|Neg||yep , i thought $ was not a pretty good price .|
|Pos||yep , i thought $ was pretty good .|
|Source||i even tried it once and it is absolutely delicious .|
|Neg||i even tried it once and it is absolutely just seasoned .|
|Pos||i even tried it once and it is delicious !|