A Machine Learning Framework for Authorship Identification From Texts

12/21/2019
by   Rahul Radhakrishnan Iyer, et al.
0

Authorship identification is a process in which the author of a text is identified. Most known literary texts can easily be attributed to a certain author because they are, for example, signed. Yet sometimes we find unfinished pieces of work or a whole bunch of manuscripts with a wide variety of possible authors. In order to assess the importance of such a manuscript, it is vital to know who wrote it. In this work, we aim to develop a machine learning framework to effectively determine authorship. We formulate the task as a single-label multi-class text categorization problem and propose a supervised machine learning framework incorporating stylometric features. This task is highly interdisciplinary in that it takes advantage of machine learning, information retrieval, and natural language processing. We present an approach and a model which learns the differences in writing style between 50 different authors and is able to predict the author of a new text with high accuracy. The accuracy is seen to increase significantly after introducing certain linguistic stylometric features along with text features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2023

Every Author as First Author

We propose a new standard for writing author names on papers and in bibl...
research
03/30/2015

Infinite Author Topic Model based on Mixed Gamma-Negative Binomial Process

Incorporating the side information of text corpus, i.e., authors, time s...
research
12/13/2019

An Unsupervised Domain-Independent Framework for Automated Detection of Persuasion Tactics in Text

With the increasing growth of social media, people have started relying ...
research
01/28/2021

DRAG: Director-Generator Language Modelling Framework for Non-Parallel Author Stylized Rewriting

Author stylized rewriting is the task of rewriting an input text in a pa...
research
11/15/2022

Classifying text using machine learning models and determining conversation drift

Text classification helps analyse texts for semantic meaning and relevan...
research
11/16/2016

The Life of Lazarillo de Tormes and of His Machine Learning Adversities

Summit work of the Spanish Golden Age and forefather of the so-called pi...
research
10/06/2020

Towards Coalgebras in Stylometry

The syntactic behaviour of texts can highly vary depending on their cont...

Please sign up or login with your details

Forgot password? Click here to reset