A Self-supervised Representation Learning of Sentence Structure for Authorship Attribution

10/14/2020
by   Fereshteh Jafariakinabad, et al.
0

Syntactic structure of sentences in a document substantially informs about its authorial writing style. Sentence representation learning has been widely explored in recent years and it has been shown that it improves the generalization of different downstream tasks across many domains. Even though utilizing probing methods in several studies suggests that these learned contextual representations implicitly encode some amount of syntax, explicit syntactic information further improves the performance of deep neural models in the domain of authorship attribution. These observations have motivated us to investigate the explicit representation learning of syntactic structure of sentences. In this paper, we propose a self-supervised framework for learning structural representations of sentences. The self-supervised network contains two components; a lexical sub-network and a syntactic sub-network which take the sequence of words and their corresponding structural labels as the input, respectively. Due to the n-to-1 mapping of words to their structural labels, each word will be embedded into a vector representation which mainly carries structural information. We evaluate the learned structural representations of sentences using different probing tasks, and subsequently utilize them in the authorship attribution task. Our experimental results indicate that the structural embeddings significantly improve the classification tasks when concatenated with the existing pre-trained word embeddings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2019

Style-aware Neural Model with Application in Authorship Attribution

Writing style is a combination of consistent decisions associated with a...
research
02/26/2019

Syntactic Recurrent Neural Network for Authorship Attribution

Writing style is a combination of consistent decisions at different leve...
research
10/30/2020

SLM: Learning a Discourse Language Representation with Sentence Unshuffling

We introduce Sentence-level Language Modeling, a new pre-training object...
research
12/13/2020

Syntactic representation learning for neural network based TTS with syntactic parse tree traversal

Syntactic structure of a sentence text is correlated with the prosodic s...
research
11/27/2018

Verb Argument Structure Alternations in Word and Sentence Embeddings

Verbs occur in different syntactic environments, or frames. We investiga...
research
01/18/2018

Natural Language Multitasking: Analyzing and Improving Syntactic Saliency of Hidden Representations

We train multi-task autoencoders on linguistic tasks and analyze the lea...
research
03/02/2017

Structural Embedding of Syntactic Trees for Machine Comprehension

Deep neural networks for machine comprehension typically utilizes only w...

Please sign up or login with your details

Forgot password? Click here to reset