On the State of the Art in Authorship Attribution and Authorship Verification

09/14/2022
by   Jacob Tyo, et al.
0

Despite decades of research on authorship attribution (AA) and authorship verification (AV), inconsistent dataset splits/filtering and mismatched evaluation methods make it difficult to assess the state of the art. In this paper, we present a survey of the fields, resolve points of confusion, introduce Valla that standardizes and benchmarks AA/AV datasets and metrics, provide a large-scale empirical evaluation, and provide apples-to-apples comparisons between existing methods. We evaluate eight promising methods on fifteen datasets (including distribution-shifted challenge sets) and introduce a new large-scale dataset based on texts archived by Project Gutenberg. Surprisingly, we find that a traditional Ngram-based model performs best on 5 (of 7) AA tasks, achieving an average macro-accuracy of 76.50% (compared to 66.71% for a BERT-based model). However, on the two AA datasets with the greatest number of words per author, as well as on the AV datasets, BERT-based models perform best. While AV methods are easily applied to AA, they are seldom included as baselines in AA papers. We show that through the application of hard-negative mining, AV methods are competitive alternatives to AA methods. Valla and all experiment code can be found here: https://github.com/JacobTyo/Valla

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2023

BERT-based Authorship Attribution on the Romanian Dataset called ROST

Being around for decades, the problem of Authorship Attribution is still...
research
03/24/2023

TRAK: Attributing Model Behavior at Scale

The goal of data attribution is to trace model predictions back to train...
research
12/27/2020

Inserting Information Bottlenecks for Attribution in Transformers

Pretrained transformers achieve the state of the art across tasks in nat...
research
07/07/2022

VeriDark: A Large-Scale Benchmark for Authorship Verification on the Dark Web

The DarkWeb represents a hotbed for illicit activity, where users commun...
research
08/26/2021

A Computational Approach to Measure Empathy and Theory-of-Mind from Written Texts

Theory-of-mind (ToM), a human ability to infer the intentions and though...
research
01/19/2023

RGB-D-Based Categorical Object Pose and Shape Estimation: Methods, Datasets, and Evaluation

Recently, various methods for 6D pose and shape estimation of objects at...
research
02/21/2022

On the Evaluation of RGB-D-based Categorical Pose and Shape Estimation

Recently, various methods for 6D pose and shape estimation of objects ha...

Please sign up or login with your details

Forgot password? Click here to reset