DT-grams: Structured Dependency Grammar Stylometry for Cross-Language Authorship Attribution

06/10/2021
by   Benjamin Murauer, et al.
0

Cross-language authorship attribution problems rely on either translation to enable the use of single-language features, or language-independent feature extraction methods. Until recently, the lack of datasets for this problem hindered the development of the latter, and single-language solutions were performed on machine-translated corpora. In this paper, we present a novel language-independent feature for authorship analysis based on dependency graphs and universal part of speech tags, called DT-grams (dependency tree grams), which are constructed by selecting specific sub-parts of the dependency graph of sentences. We evaluate DT-grams by performing cross-language authorship attribution on untranslated datasets of bilingual authors, showing that, on average, they achieve a macro-averaged F1 score of 0.081 higher than previous methods across five different language pairs. Additionally, by providing results for a diverse set of features for comparison, we provide a baseline on the previously undocumented task of untranslated cross-language authorship attribution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2016

A Supervised Authorship Attribution Framework for Bengali Language

Authorship Attribution is a long-standing problem in Natural Language Pr...
research
04/11/2022

A Multilingual Perspective Towards the Evaluation of Attribution Methods in Natural Language Inference

Most evaluations of attribution methods focus on the English language. I...
research
04/26/2021

Towards Rigorous Interpretations: a Formalisation of Feature Attribution

Feature attribution is often loosely presented as the process of selecti...
research
05/02/2020

A Girl Has A Name: Detecting Authorship Obfuscation

Authorship attribution aims to identify the author of a text based on th...
research
04/17/2021

The Topic Confusion Task: A Novel Scenario for Authorship Attribution

Authorship attribution is the problem of identifying the most plausible ...
research
09/15/2015

Dependency length minimization: Puzzles and Promises

In the recent issue of PNAS, Futrell et al. claims that their study of 3...
research
06/21/2022

TraSE: Towards Tackling Authorial Style from a Cognitive Science Perspective

Stylistic analysis of text is a key task in research areas ranging from ...

Please sign up or login with your details

Forgot password? Click here to reset