Transferring BERT-like Transformers' Knowledge for Authorship Verification

12/09/2021
by   Andrei Manolache, et al.
15

The task of identifying the author of a text spans several decades and was tackled using linguistics, statistics, and, more recently, machine learning. Inspired by the impressive performance gains across a broad range of natural language processing tasks and by the recent availability of the PAN large-scale authorship dataset, we first study the effectiveness of several BERT-like transformers for the task of authorship verification. Such models prove to achieve very high scores consistently. Next, we empirically show that they focus on topical clues rather than on author writing style characteristics, taking advantage of existing biases in the dataset. To address this problem, we provide new splits for PAN-2020, where training and test data are sampled from disjoint topics or authors. Finally, we introduce DarkReddit, a dataset with a different input data distribution. We further use it to analyze the domain generalization performance of models in a low-data regime and how performance varies when using the proposed PAN-2020 splits for fine-tuning. We show that those splits can enhance the models' capability to transfer knowledge over a new, significantly different dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2020

Federated pretraining and fine tuning of BERT using clinical notes from multiple silos

Large scale contextual representation models, such as BERT, have signifi...
research
11/20/2020

Fine-Tuning BERT for Sentiment Analysis of Vietnamese Reviews

Sentiment analysis is an important task in the field ofNature Language P...
research
08/15/2019

M-BERT: Injecting Multimodal Information in the BERT Structure

Multimodal language analysis is an emerging research area in natural lan...
research
05/22/2020

Med-BERT: pre-trained contextualized embeddings on large-scale structured electronic health records for disease prediction

Deep learning (DL) based predictive models from electronic health record...
research
05/16/2023

Adapting Sentence Transformers for the Aviation Domain

Learning effective sentence representations is crucial for many Natural ...
research
06/04/2022

Actuarial Applications of Natural Language Processing Using Transformers: Case Studies for Using Text Features in an Actuarial Context

This tutorial demonstrates workflows to incorporate text data into actua...
research
09/28/2021

How Different Text-preprocessing Techniques Using The BERT Model Affect The Gender Profiling of Authors

Forensic author profiling plays an important role in indicating possible...

Please sign up or login with your details

Forgot password? Click here to reset