Structure-Tags Improve Text Classification for Scholarly Document Quality Prediction

Training recurrent neural networks on long texts, in particular scholarly documents, causes problems for learning. While hierarchical attention networks (HANs) are effective in solving these problems, they still lose important information about the structure of the text. To tackle these problems, we propose the use of HANs combined with structure-tags which mark the role of sentences in the document. Adding tags to sentences, marking them as corresponding to title, abstract or main body text, yields improvements over the state-of-the-art for scholarly document quality prediction: substantial gains on average against other models and consistent improvements over HANs without structure-tags. The proposed system is applied to the task of accept/reject prediction on the PeerRead dataset and compared against a recent BiLSTM-based model and joint textual+visual model. It gains 4.7 the best of both models on the computation and language domain and loses 2.4 against the best of both on the machine learning domain. Compared to plain HANs, accuracy increases on both domains, with 1.5 also obtain improvements when introducing the tags for prediction of the number of citations for 88k scientific publications that we compiled from the Allen AI S2ORC dataset. For our HAN-system with structure-tags we reach 28.5 variance, an improvement of 1.0

READ FULL TEXT
research
08/15/2023

MultiSChuBERT: Effective Multimodal Fusion for Scholarly Document Quality Prediction

Automatic assessment of the quality of scholarly documents is a difficul...
research
02/26/2019

Syntactic Recurrent Neural Network for Authorship Attribution

Writing style is a combination of consistent decisions at different leve...
research
10/14/2021

Tagged Documents Co-Clustering

Tags are short sequences of words allowing to describe textual and non-t...
research
10/08/2019

Read, Highlight and Summarize: A Hierarchical Neural Semantic Encoder-based Approach

Traditional sequence-to-sequence (seq2seq) models and other variations o...
research
03/12/2016

Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks

Recent approaches based on artificial neural networks (ANNs) have shown ...
research
06/10/2018

Cross-Lingual Task-Specific Representation Learning for Text Classification in Resource Poor Languages

Neural network models have shown promising results for text classificati...
research
08/15/2018

Folksonomication: Predicting Tags for Movies from Plot Synopses Using Emotion Flow Encoded Neural Network

Folksonomy of movies covers a wide range of heterogeneous information ab...

Please sign up or login with your details

Forgot password? Click here to reset