Whodunit? Learning to Contrast for Authorship Attribution

09/23/2022
by   Bo Ai, et al.
8

Authorship attribution is the task of identifying the author of a given text. Most existing approaches use manually designed features that capture a dataset's content and style. However, this dataset-dependent approach yields inconsistent performance. Thus, we propose to fine-tune pre-trained language representations using a combination of contrastive learning and supervised learning (Contra-X). We show that Contra-X advances the state-of-the-art on multiple human and machine authorship attribution benchmarks, enabling improvements of up to 6.8 to cross-entropy fine-tuning across different data regimes. Crucially, we present qualitative and quantitative analyses of these improvements. Our learned representations form highly separable clusters for different authors. However, we find that contrastive learning improves overall accuracy at the cost of sacrificing performance for some authors. Resolving this tension will be an important direction for future work. To the best of our knowledge, we are the first to analyze the effect of combining contrastive learning with cross-entropy fine-tuning for authorship attribution.

READ FULL TEXT

page 7

page 14

page 16

research
11/03/2020

Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning

State-of-the-art natural language understanding classification models fo...
research
11/12/2020

Bi-tuning of Pre-trained Representations

It is common within the deep learning community to first pre-train a dee...
research
02/12/2021

Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning

Contrastive self-supervised learning (CSL) leverages unlabeled data to t...
research
10/12/2022

AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning

Fine-tuning large pre-trained language models on downstream tasks is apt...
research
10/27/2022

Dictionary-Assisted Supervised Contrastive Learning

Text analysis in the social sciences often involves using specialized di...
research
06/21/2022

TraSE: Towards Tackling Authorial Style from a Cognitive Science Perspective

Stylistic analysis of text is a key task in research areas ranging from ...
research
04/17/2021

The Topic Confusion Task: A Novel Scenario for Authorship Attribution

Authorship attribution is the problem of identifying the most plausible ...

Please sign up or login with your details

Forgot password? Click here to reset