Syntactic Recurrent Neural Network for Authorship Attribution

02/26/2019 ∙ by Fereshteh Jafariakinabad, et al. ∙ University of Central Florida 0

Writing style is a combination of consistent decisions at different levels of language production including lexical, syntactic, and structural associated to a specific author (or author groups). While lexical-based models have been widely explored in style-based text classification, relying on context makes the model less scalable when dealing with heterogeneous data comprised of various topics. On the other hand, syntactic models which are context-independent, are more robust against topic variance. In this paper, we introduce a syntactic recurrent neural network to encode the syntactic patterns of a document in a hierarchical structure. The model first learns the syntactic representation of sentences from the sequence of part-of-speech tags. For this purpose, we exploit both convolutional filters and long short-term memories to investigate the short-term and long-term dependencies of part-of-speech tags in the sentences. Subsequently, the syntactic representations of sentences are aggregated into document representation using recurrent neural networks. Our experimental results on PAN 2012 dataset for authorship attribution task shows that syntactic recurrent neural network outperforms the lexical model with the identical architecture by approximately 14



There are no comments yet.


page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Individuals express their thoughts in different ways due to many factors including, the conventions of language, educational background, and intended audience, etc. In written language, the combination of consistent conscious or unconscious decisions in language production, known as writing style, has been studied widely. Early work on computational stylometry was introduced in the 1960s by Mosteller and Wallace on federalist papers Mosteller and Wallace (1964)

. Unprecedented availability of digital data in recent years along with the advancements in machine learning techniques has led to an increase in scholarly attention to the field of Computational stylometry

Koppel et al. (2009); Neal et al. (2017).

Stylistic features are generally content-independent which means that they are mainly consistent across different documents written by a specific author or author groups. Lexical, syntactic, and structural features are three main families of stylistic features. Lexical features represent author’s character and word use preferences, while syntactic features capture the syntactic patterns of sentences in a document. Structural features reveal information about how an author organizes the structure of a document.

One of the basic problems which is rarely addressed in the literature is the interaction of style and content. While content words can be predictive features of authorial writing style due to the fact that they carry information about author’s lexical choice, excluding content words as features is a fundamental step for avoiding topic detection rather than style detection Argamon-Engelson et al. (1998). However, syntactic and structural features are content-independent which makes them robust against divergence of topics.

The early proposed methods in style detection are conventional machine learning techniques which are based on count-based features. Deep neural networks, although have been widely explored later on in several domains of natural language processing, only few studies have employed this approach to stylometry and authorship attribution

Gagala (2018). The adopted approaches in deep neural network for style-based text classification mainly focus on lexical features despite the fact that lexical-based language models have very limited scalability when dealing with dataset containing diverse topics and genre.

While previously proposed deep neural network approaches focus on lexical level, we introduce a syntactic recurrent neural network which hierarchically learns and encodes the syntactic structure of documents. First, the syntactic representation of sentences are learned from the sequence of part-of-speech (POS) tags and then they aggregate into document representation using recurrent neural networks. Afterwards, we use attention mechanism to reward the sentences which contribute more to the detection of authorial writing style. In order to investigate the effect of long-term and short-term dependencies of POS tags in a sentence, we employ long short-term memory (LSTM) and convolutional neural networks (CNN) respectively. The proposed model is expected to be more effective than the conventional count-based models.

The remainder of this paper is organized as follows. In Section 2, we review the proposed methods in the literature for style-based text classification. We elaborate our proposed approach in Section 3. In Section 4, we discuss the dataset followed by performance study. Finally, we conclude the paper in Section 5.

2 Related Work

Writing style is a combination of consistent decisions at different levels of language production including lexical, syntactic, and structural associated to a specific author (or author groups, e.g. female authors or teenage authors) Daelemans (2013). Nowadays, computational stylometry has a wide range of applications in literary science Kabbara and Cheung (2016); van der Lee and van den Bosch (2017), forensics Brennan et al. (2012); Afroz et al. (2012); Wang (2017), and psycholinguistics Newman et al. (2003); Pennebaker and King (1999). Style-based text classification was proposed by Argamon-Engelson et al. Argamon-Engelson et al. (1998)

. The authors used basic stylistic features (the frequency of function words and part-of-speech trigrams) to classify news documents based on the corresponding publisher (newspaper or magazine) as well as text genre (editorial or news item).

2.1 Syntax for Style Detection

Syntactic n-grams are shown to achieve promising results in different stylometric tasks including author profiling task

Posadas-Durán et al. (2015) and author verification task Krause (2014). In particular, Raghavan et al. investigated the use of syntactic information by proposing a probabilistic context-free grammar for the authorship attribution purpose, and used it as a language model for classification Raghavan et al. (2010). A combination of lexical and syntactic features has also shown to enhance the model performance. Sundararajan et al. argue that, although syntax can be helpful for cross-genre authorship attribution, combining syntax and lexical information can further boost the performance for cross-topic attribution and single-domain attribution Sundararajan and Woodard (2018). Further studies which combine lexical and syntactic features include Soler and Wanner (2017); Schwartz et al. (2017); Kreutz and Daelemans (2018)

2.2 Neural Network in Stylometry

With the recent advances in deep learning, there exists a large body of work in the literature which employs deep neural networks for stylometry and authorship attribution. For instance, Ge et al. used a feed forward neural network language model on an authorship attribution task. The output achieves promising results compared to the n-gram baseline

Ge et al. (2016). Bagnall et al. have employed a recurrent neural network with a shared recurrent state which outperforms other proposed methods in PAN 2015 task Bagnall (2016).

Methods that particularly use CNN for stylometry application include the following. Shrestha et al. applied CNN based on character n-gram to identify the authors of tweets. Given that each tweet is short in nature, their approach shows that a sequence of character n-grams as an to CNN allows the architecture to capture the character-level interactions, which afterwards is aggregated to learn higher-level patterns for modeling the style Shrestha et al. (2017)

. Hitchler et al. propose a CNN based on pretrained embedding word vector concatenated with one hot encoding of POS tags; however, they have not shown any ablation study to report the contribution of POS tags on the final performance results

Hitschler et al. (2017). Alharthi et al. propose a book recommendation system, using an author prediction task to learn a representation which is transferable for a book recommendation process Alharthi et al. (2018).

3 The Proposed Model: Syntactic Recurrent Neural Network

We introduce a syntactic recurrent neural network to encode the syntactic patterns of a document in a hierarchical structure. First, we represent each sentence as a sequence of POS tags and each POS tag is embedded into a low dimensional vector and a POS encoder (which can be a CNN or LSTM) learns the syntactic representation of sentences. Subsequently, the learned sentence representations aggregate into the document representation. Moreover, we use attention mechanism to reward the sentences which contribute more to the prediction of labels. Afterwards we use a softmax classifier to compute the probability distribution over class labels. The overall architecture of the network is shown in figure

1. In the following sections, we elaborate the main components of the model.

Figure 1: The Overall Architecture of Syntactic Recurrent Neural Network for Style-based Text Classification

3.1 POS Embedding

We assume that each document is a sequence of sentences and each sentence is a sequence of words, where , and

are model hyperparameters and the best values are explored through the hyperparameter tuning phase (Section

4.3). Given a sentence, we convert each word into the corresponding POS tag in the sentence and afterwards we embed each POS tag into a low dimensional vector using a trainable lookup table , where is the set of all possible POS tags in the language. We use NLTK part-of-speech tagger Bird et al. (2009) for the tagging purpose and use the set of POS tags111 in our model as follows.

T = { CC, CD, DT, EX, FW, IN, JJ, JJR, JJS, LS, MD, NN, NNS, NNP, NNPS, PDT, POS, PRP, PRP$, RB, RBR, RBS, RP, SYM, TO, UH, VB, VBD, VBG, VBN, VBP, VBZ, WDT, WP, WP$, WRB, ‘,’, ‘:’, ‘…’, ‘;’, ‘?’, ‘!’, ‘.’, ‘$’, ‘(’, ‘)’, “‘ ’, ‘” ’}

One of the advantages of using POS tags instead of words is its low dimensional lookup table compared to the word embeddings, where the size of vocabulary in large datasets usually surpasses 50K words. On the other hand, the size of POS embedding lookup table is significantly smaller, fixed, and independent of the dataset which makes the proposed model less likely to have out-of-vocabulary words.

3.2 POS Encoder

POS encoder learns the syntactic representation of sentences from the output of POS embedding layer. In order to investigate the effect of short-term and long-term dependencies of POS tags in the sentence, we exploit both CNNs and LSTMs.

3.2.1 Short-term Dependencies

CNNs generally capture the short-term dependencies of words in the sentences which make them robust to the varying length of sentences in the documents. Lexical based CNN models have been used widely for text classification and sentiment analysis

Johnson and Zhang (2014); Wang et al. (2012); Kim (2014); Collobert et al. (2011) and they generally outperform the conventional n-gram vector-based methods.

Let be the vector representation of sentence and be the convolutional filter with receptive field size of

. We apply a single layer of convolving filters with varying window sizes as the of rectified linear unit function (relu) with a bias term b, followed by a temporal max-pooling layer which returns only the maximum value of each feature map

. Consequently, each sentence is represented by its most important syntactic n-grams, independent of their position in the sentence. Variable receptive field sizes are used to compute vectors for different n-grams in parallel and they are concatenated into a final feature vector afterwards, where is the total number of filters:

3.2.2 Long-term Dependencies

Recurrent neural networks especially LSTMs are capable of capturing the long-term relations in sequences which make them more effective compared to the conventional n-gram models where increasing the length of sequences results a sparse matrix representation of documents. Lexical-based recurrent neural networks have been widely used for text classification tasks Tang et al. (2015); Yang et al. (2016).

Let be the vector representation of sentence . As an alternative to CNN, we use a bidirectional LSTM to encode each sentence. The forward LSTM reads the sentence from to and the backward LSTM reads the sentence from to . The feature vector is concatenation of the forward LSTM and the backward LSTM, where is the dimensionality of the hidden state. The final vector representation of sentence , is computed as unweighted sum of the learned vector representation of POS tags in the sentence. This allows us to represent a sentence by its overall syntactic pattern.

3.3 Sentence Encoder

Sentence encoder learns the syntactic representation of a document from the sequence of sentence representations outputted from the POS encoder. We use a bidirectional LSTM To capture how sentences with different syntactic patterns are structured in a document. The outputted vector from the sentence encoder is calculated as follows.

Needless to say, not all sentences are equally informative about the authorial style of a document. Therefore, we incorporate attention mechanism to reveal the sentences that contribute more in detecting the writing style. We define a sentence level vector and use it to measure the importance of the sentence as follows:

Where is a learnable vector and is randomly initialized during the training process and is the vector representation of document which is weighted sum of vector representations of all sentences.

Train Data I Train Data II Test Data
Word Count Sentence Length Word Count Sentence Length Word Count Sentence Length
Candidate 01 73,449 17 76,602 19 70,112 20
Candidate 02 180,660 13 117,024 14 82,317 13
Candidate 03 158,306 17 121,301 19 151,049 15
Candidate 04 84,080 14 79,413 18 93,055 14
Candidate 05 109,857 18 141,086 15 96,663 15
Candidate 06 61,644 19 46,549 16 42,808 16
Candidate 07 71,106 16 70,563 18 84,996 21
Candidate 08 106,024 18 113,475 15 94,700 13
Candidate 09 66,840 15 41,093 15 194,547 15
Candidate 10 86,681 14 35,699 16 60,998 16
Candidate 11 53,960 19 48,037 13 80,330 24
Candidate 12 49,543 25 64,495 26 50,636 27
Candidate 13 32,900 21 153,994 32 77,780 27
Candidate 14 89,908 23 71,058 22 52,633 35
Table 1: Corpust Statistics.

3.4 Classification

The learned vector representation of documents are fed into a softmax classifier to compute the probability distribution of class labels. Suppose is the vector representation of document learned by the attention layer. The prediction

is the output of softmax layer and is computed as:

Where , are learnable weight and learnable bias respectively and is a dimensional vector (C is the number of classes). We use cross-entropy loss to measure the discrepancy of predictions and true labels

. The model parameters are optimized to minimize the cross-entropy loss over all the documents in the training corpus. Hence, the regularized loss function over

documents denoted by is:

4 Experimental Results

4.1 Dataset

We evaluate our proposed method on a commonly used benchmark dataset from PAN 2012 authorship attribution shared task222 We chose Task I dataset which corresponds to the authorship attribution among a closed set of 14 authors. The training set comprises 28 novel-length documents (two per candidate author), ranging from 32,000 words up to about 180,000 words. The test set consists of 14 novels (one per candidate author) with the length ranging from 42,000 words up to 190,000 words. Table 1 reports the word count and the averaged sentence length of documents in both train and test set for each candidate author.

In order to generate enough train/test samples, we have schematized the novels into the segments with a number of sentences (sequence length). The best value of is explored through the hyperparameter tuning phase (Section 4.3). Accordingly, the performance measures include segment-level categorical accuracy as well as document-level categorical accuracy. In the latter, we use majority voting to label a document based on the segment-level predictions.

4.2 Baselines

For our baselines, we employ standard syntactic n-gram model as a syntactic approach and word n-gram model as a lexical approach. For both models, we have used Support Vector Machine (SVM) classifier with linear kernel. Moreover, in order to compare the performance of syntactic recurrent neural network to the lexical based approaches, we fed the sequence of words to a neural network with the identical architecture. We use 300 dimensional pretrained Glove embeddings

Pennington et al. (2014) for the embedding layer in the network. In order to reduces the effect of out-of-vocabulary problem, we retain only 50,000 most frequent words.

4.3 Hyperparameter Tuning

In this part we examine the effect of different hyperparameters on the performance of the proposed model. All the performance metrics are the mean of segment-level accuracy (on the test set) calculated over 10 runs with 0.9/0.1 train/validation split. We use Nadam optimizer Sutskever et al. (2013)

to optimize the cross entropy loss over 30 epochs of training.

4.3.1 CNN for POS encoding

Figure 2: The effect of different receptive fields sizes and number of layers (n_layers) on the performance of syntactic recurrent neural network

Figure 2 illustrates the performance of syntactic recurrent neural network when CNN is used as POS encoder, across different receptive field sizes and number of layers while other parameters are kept constant. We observe that, increasing the number of convolutional layers generally lessens the performance. This can be due to the fact that each layer adds to the complexity of model which yields to the higher number of parameters and limited training data aggravates the performance of the model. Moreover, in one convolutional layer, the accuracy generally increases by increasing the size of receptive fields simply because receptive fields with the higher sizes capture longer syntactic sequences which are more informative.

In our experiments, we also observed that having parallel convolutional layers with different receptive fields sizes improves the performance. Therefore, in the final model, we use one layer of multiple convolutional filters with the receptive filed sizes of 3 and 5.

Model Segment-Level Accuracy (%) Document-Level Accuracy(%)
Validation Test
Word N-grams-SVM 90.71 58.35 78.57 (11/14 novels)
Lexical CNN-LSTM 98.88 64.12 78.57 (11/14 novels)
LSTM-LSTM 96.83 63.92 85.71 (12/14 novels)
POS N-grams-SVM 89.60 69.66 92.85 (13/14 novels)
Syntactic CNN-LSTM 93.22 78.76 100.00 (14/14 novels)
LSTM-LSTM 95.00 74.40 100.00 (14/14 novels)
Table 2: The performance results of models on PAN 2012 dataset for authorship attribution task.

4.3.2 LSTM for POS encoding

Figure 3 demonstrates the accuracy of the proposed model when LSTM is employed as POS encoder, across different values of sentence length () and sequence length (: the number of sentences in each segment). We observe from the figure that increasing the sequence length boosts the performance and the model achieves higher accuracy on the segments with 100 sentences (74.40) than the segments with only 20 sentences (60.02). This observation confirms that investigation of writing style in short documents is more challenging Neal et al. (2017).

As shown in the table 1, the average sentence length in the dataset ranges from 13 to 35. Therefore, we have examined the sentence length of 10, 20, 30, and 40 (the performance of the model is identical when the sentence length is 30 and 40, so we have not included the latter results in the figure). We observe that increasing the length of sentences to 30 words improves the performance primarily because decreasing the sentence length ignores several words in the sentence which leads to notable information loss. To sum up, syntactic neural network accepts segments as the inputs where each segment contain 100 sentences and the length of each sentence is 30.

Figure 3: The effect of sentence length and sequence length on the performance of syntactic recurrent neural network

4.4 Results

We report both segment-level and document-level accuracy. As mentioned before, each document (novel) has been divided into the segments of 100 sentences. Therefore, each segment in a novel has classified independently and afterwards the label of each document is calculated as the majority voting of its constituent segments. Table 2 reports the performance results of baselines and the proposed model (with both CNN and LSTM as POS encoder) on the PAN 2012 dataset. According to the segment-level accuracy, the performance of all models has dropped significantly on the test set mainly because of insufficient training data. We expect that if the models are trained on enough writing samples per author, the test results would be closer to the validation results.

Unsurprisingly, syntactic CNN-LSTM model outperforms the conventional POS n-gram model (POS N-gram-SVM) by improvement in segment-level accuracy and improvement in document-level accuracy. This is primarily because syntactic CNN-LSTM not only represents a sentence by its important syntactic n-grams but also learns how these sentences are structured in a document. On the other hand, POS N-gram-SVM model only captures the frequency of different n-grams in the document.

4.4.1 Syntactic v.s. Lexical

According to the table 2, both syntactic recurrent neural networks (CNN-LSTM and LSTM-LSTM) outperform the lexical models by achieving the highest document-level accuracy (). Syntactic recurrent neural networks have correctly classified all the 14 novels in the test set while lexical LSTM-LSTM achieves the highest document-level accuracy () in the lexical models by correctly classifying 12 novels.

In segment-level classification, syntactic recurrent neural networks outperform the lexical models in the test time with higher accuracy; however, the lexical models achieve higher validation accuracy. This observation may imply the lower generalization capability of lexical models compared to the syntactic models in the style-based text classification.

4.4.2 Short-Term v.s. Long-Term

According to the results in table 2, syntactic CNN-LSTM model slightly outperforms syntactic LSTM-LSTM by approximately in segment-level accuracy. The primary difference of two models is the way they represent a sentence. In syntactic CNN-LSTM, each sentence is represented by its important syntactic n-gram independent of their position in the sentence. However, syntactic LSTM-LSTM mainly captures the overall syntactic pattern of a sentence by summing up all the learned vector representations of POS tags in the sentence.

4.4.3 Short Documents v.s. Long Documents

We have conducted a controlled study on the effect of document length on the performance of both CNN-LSTM and LSTM-LSTM models. For this purpose, we have trained each model on only specific fraction of each training document and afterwards tested the trained model on the whole test set. We keep the number of model parameters in both models approximately equal to eliminate the effect of data limitation on the training process. Figure 4 demonstrates the performance results of models when trained on the first of segments in each document.

Figure 4: The performance of CNN-LSTM and LSTM-LSTM models when trained on the different number of segments per document
Figure 5: The confusion matrices of lexical and syntactic recurrent neural network. The labels in vertical and horizontal axis indicate class labels. (a) Lexical CNN-LSTM model (b) Lexical LSTM-LSTM model (c) Syntactic CNN-LSTM model (d) Syntactic LSTM-LSTM model

We observe that when the smaller portion of segments () are used for training, LSTM-LSTM models achieve higher test accuracy than CNN-LSTM models in both syntactic and lexical settings. On the other hand, CNN-LSTM models slightly outperform LSTM-LSTM models when the number of segments used for training in each document increases. On the other words, LSTM-LSTM models appear to be quicker in capturing authorial writing style than CNN-LSTM models which this property makes them a preferred potential model when investigating authorial writing style in a dataset of short documents.

4.4.4 Class-wise Performance

Figure 5 illustrates the segment-level recall for each class label for both lexical (a and b) and syntactic recurrent neural networks (c and d). Cell [i,j] reports the fraction of segments in document written by author i where attributed to author j. In lexical networks, LSTM-LSTM have lower miss-classification rate (2 incorrectly classified documents) than CNN (3 incorrectly classified documents). Syntactic CNN-LSTM and LSTM-LSTM achieve the highest recall and correctly classify all the 14 documents in the test set. Both lexical models have relatively low recall in class labels 1,4,7,11 and 12 while both syntactic models show low recall in class label 13. Moreover, both lexical models as well as syntactic CNN-LSTM show lower recall for class label 11 and 12; however, syntactic LSTM-LSTM shows a higher recall in these classes.

5 Conclusion and Future Work

In this paper, we introduced a syntactic recurrent neural network in order to encode the syntactic patterns of documents in a hierarchical structure and afterwards used the learned syntactic representation of document for style-based text classification. We investigated both long-term and short-term dependencies of part-of-speech (POS) tags in sentences. According to our experimental results on PAN 2012 dataset, syntactic recurrent neural networks outperform lexical based networks by in terms of segment-level accuracy. Moreover, we observed that LSTM-based POS encoders are quicker in capturing the authorial writing style than CNN-based POS encoders which this property makes them a preferable model when investigating authorial writing style in a dataset of short documents.


  • Afroz et al. (2012) Sadia Afroz, Michael Brennan, and Rachel Greenstadt. 2012. Detecting hoaxes, frauds, and deception in writing style online. In Security and Privacy (SP), 2012 IEEE Symposium on, pages 461–475. IEEE.
  • Alharthi et al. (2018) Haifa Alharthi, Diana Inkpen, and Stan Szpakowicz. 2018. Authorship identification for literary book recommendations. In Proceedings of the 27th International Conference on Computational Linguistics, pages 390–400.
  • Argamon-Engelson et al. (1998) Shlomo Argamon-Engelson, Moshe Koppel, and Galit Avneri. 1998. Style-based text categorization: What newspaper am i reading. In Proc. of the AAAI Workshop on Text Categorization, pages 1–4.
  • Bagnall (2016) Douglas Bagnall. 2016. Authorship clustering using multi-headed recurrent neural networks. arXiv preprint arXiv:1608.04485.
  • Bird et al. (2009) Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”.
  • Brennan et al. (2012) Michael Brennan, Sadia Afroz, and Rachel Greenstadt. 2012. Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity. ACM Transactions on Information and System Security (TISSEC), 15(3):12.
  • Collobert et al. (2011) Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug):2493–2537.
  • Daelemans (2013) Walter Daelemans. 2013. Explanation in computational stylometry. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 451–462. Springer.
  • Gagala (2018) Lukasz Gagala. 2018. Authorship attribution with neural networks and multiple features.
  • Ge et al. (2016) Zhenhao Ge, Yufang Sun, and Mark JT Smith. 2016. Authorship attribution using a neural network language model. In AAAI, pages 4212–4213.
  • Hitschler et al. (2017) Julian Hitschler, Esther van den Berg, and Ines Rehbein. 2017. Authorship attribution with convolutional neural networks and pos-eliding. In Proceedings of the Workshop on Stylistic Variation, pages 53–58.
  • Johnson and Zhang (2014) Rie Johnson and Tong Zhang. 2014. Effective use of word order for text categorization with convolutional neural networks. arXiv preprint arXiv:1412.1058.
  • Kabbara and Cheung (2016) Jad Kabbara and Jackie Chi Kit Cheung. 2016.

    Stylistic transfer in natural language generation systems using recurrent neural networks.

    In Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods, pages 43–47.
  • Kim (2014) Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.
  • Koppel et al. (2009) Moshe Koppel, Jonathan Schler, and Shlomo Argamon. 2009. Computational methods in authorship attribution. Journal of the American Society for information Science and Technology, 60(1):9–26.
  • Krause (2014) Markus Krause. 2014. A behavioral biometrics based authentication method for mooc’s that is robust against imitation attempts. In Proceedings of the first ACM conference on Learning@ scale conference, pages 201–202. ACM.
  • Kreutz and Daelemans (2018) Tim Kreutz and Walter Daelemans. 2018. Exploring classifier combinations for language variety identification. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pages 191–198.
  • van der Lee and van den Bosch (2017) Chris van der Lee and Antal van den Bosch. 2017. Exploring lexical and syntactic features for language variety identification. In Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), pages 190–199.
  • Mosteller and Wallace (1964) Frederick Mosteller and David Wallace. 1964. Inference and disputed authorship: The federalist.
  • Neal et al. (2017) Tempestt Neal, Kalaivani Sundararajan, Aneez Fatima, Yiming Yan, Yingfei Xiang, and Damon Woodard. 2017. Surveying stylometry techniques and applications. ACM Computing Surveys (CSUR), 50(6):86.
  • Newman et al. (2003) Matthew L Newman, James W Pennebaker, Diane S Berry, and Jane M Richards. 2003. Lying words: Predicting deception from linguistic styles. Personality and social psychology bulletin, 29(5):665–675.
  • Pennebaker and King (1999) James W Pennebaker and Laura A King. 1999. Linguistic styles: Language use as an individual difference. Journal of personality and social psychology, 77(6):1296.
  • Pennington et al. (2014) Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
  • Posadas-Durán et al. (2015) Juan-Pablo Posadas-Durán, Ilia Markov, Helena Gómez-Adorno, Grigori Sidorov, Ildar Batyrshin, Alexander Gelbukh, and Obdulia Pichardo-Lagunas. 2015. Syntactic n-grams as features for the author profiling task. Working Notes Papers of the CLEF.
  • Raghavan et al. (2010) Sindhu Raghavan, Adriana Kovashka, and Raymond Mooney. 2010. Authorship attribution using probabilistic context-free grammars. In Proceedings of the ACL 2010 Conference Short Papers, pages 38–42. Association for Computational Linguistics.
  • Schwartz et al. (2017) Roy Schwartz, Maarten Sap, Ioannis Konstas, Li Zilles, Yejin Choi, and Noah A Smith. 2017. The effect of different writing tasks on linguistic style: A case study of the roc story cloze task. arXiv preprint arXiv:1702.01841.
  • Shrestha et al. (2017) Prasha Shrestha, Sebastian Sierra, Fabio Gonzalez, Manuel Montes, Paolo Rosso, and Thamar Solorio. 2017. Convolutional neural networks for authorship attribution of short texts. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, volume 2, pages 669–674.
  • Soler and Wanner (2017) Juan Soler and Leo Wanner. 2017. On the relevance of syntactic and discourse features for author profiling and identification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, volume 2, pages 681–687.
  • Sundararajan and Woodard (2018) Kalaivani Sundararajan and Damon Woodard. 2018. What represents” style” in authorship attribution? In Proceedings of the 27th International Conference on Computational Linguistics, pages 2814–2822.
  • Sutskever et al. (2013) Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. 2013. On the importance of initialization and momentum in deep learning. In International conference on machine learning, pages 1139–1147.
  • Tang et al. (2015) Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 1422–1432.
  • Wang et al. (2012) Tao Wang, David J Wu, Adam Coates, and Andrew Y Ng. 2012. End-to-end text recognition with convolutional neural networks. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 3304–3308. IEEE.
  • Wang (2017) William Yang Wang. 2017. ” liar, liar pants on fire”: A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648.
  • Yang et al. (2016) Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1480–1489.