Integrating Bidirectional Long Short-Term Memory with Subword Embedding for Authorship Attribution

06/26/2023
by   Abiodun Modupe, et al.
0

The problem of unveiling the author of a given text document from multiple candidate authors is called authorship attribution. Manifold word-based stylistic markers have been successfully used in deep learning methods to deal with the intrinsic problem of authorship attribution. Unfortunately, the performance of word-based authorship attribution systems is limited by the vocabulary of the training corpus. Literature has recommended character-based stylistic markers as an alternative to overcome the hidden word problem. However, character-based methods often fail to capture the sequential relationship of words in texts which is a chasm for further improvement. The question addressed in this paper is whether it is possible to address the ambiguity of hidden words in text documents while preserving the sequential context of words. Consequently, a method based on bidirectional long short-term memory (BLSTM) with a 2-dimensional convolutional neural network (CNN) is proposed to capture sequential writing styles for authorship attribution. The BLSTM was used to obtain the sequential relationship among characteristics using subword information. The 2-dimensional CNN was applied to understand the local syntactical position of the style from unlabeled input text. The proposed method was experimentally evaluated against numerous state-of-the-art methods across the public corporal of CCAT50, IMDb62, Blog50, and Twitter50. Experimental results indicate accuracy improvement of 1.07%, and 0.96% on CCAT50 and Twitter, respectively, and produce comparable results on the remaining datasets.

READ FULL TEXT
research
01/11/2020

Authorship Attribution in Bangla literature using Character-level CNN

Characters are the smallest unit of text that can extract stylometric si...
research
12/26/2018

An Investigation of Supervised Learning Methods for Authorship Attribution in Short Hinglish Texts using Char & Word N-grams

The writing style of a person can be affirmed as a unique identity indic...
research
10/26/2020

Malicious Requests Detection with Improved Bidirectional Long Short-term Memory Neural Networks

Detecting and intercepting malicious requests are one of the most widely...
research
02/25/2018

One Single Deep Bidirectional LSTM Network for Word Sense Disambiguation of Text Data

Due to recent technical and scientific advances, we have a wealth of inf...
research
12/21/2020

TechTexC: Classification of Technical Texts using Convolution and Bidirectional Long Short Term Memory Network

This paper illustrates the details description of technical text classif...
research
10/02/2019

NASS-AI: Towards Digitization of Parliamentary Bills using Document Level Embedding and Bidirectional Long Short-Term Memory

There has been several reports in the Nigerian and International media a...
research
07/02/2018

Punctuation Prediction Model for Conversational Speech

An ASR system usually does not predict any punctuation or capitalization...

Please sign up or login with your details

Forgot password? Click here to reset