Aspect-based Document Similarity for Research Papers

10/13/2020
by   Malte Ostendorff, et al.
0

Traditional document similarity measures provide a coarse-grained distinction between similar and dissimilar documents. Typically, they do not consider in what aspects two documents are similar. This limits the granularity of applications like recommender systems that rely on document similarity. In this paper, we extend similarity with aspect information by performing a pairwise document classification task. We evaluate our aspect-based document similarity for research papers. Paper citations indicate the aspect-based similarity, i.e., the section title in which a citation occurs acts as a label for the pair of citing and cited paper. We apply a series of Transformer models such as RoBERTa, ELECTRA, XLNet, and BERT variations and compare them to an LSTM baseline. We perform our experiments on two newly constructed datasets of 172,073 research paper pairs from the ACL Anthology and CORD-19 corpus. Our results show SciBERT as the best performing system. A qualitative examination validates our quantitative results. Our findings motivate future research of aspect-based document similarity and the development of a recommender system based on the evaluated techniques. We make our datasets, code, and trained models publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/28/2022

Specialized Document Embeddings for Aspect-based Similarity of Research Papers

Document embeddings and similarity measures underpin content-based recom...
research
03/22/2020

Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles

Many digital libraries recommend literature to their users considering t...
research
08/01/2020

Contextual Document Similarity for Content-based Literature Recommender Systems

To cope with the ever-growing information overload, an increasing number...
research
04/28/2021

Evaluating Document Representations for Content-based Legal Literature Recommendations

Recommender systems assist legal professionals in finding relevant liter...
research
06/02/2021

Self-Supervised Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference

We present a novel model for the problem of ranking a collection of docu...
research
08/20/2021

Supervised Contrastive Learning for Interpretable Long Document Comparison

Recent advancements in deep learning techniques have transformed the are...
research
04/13/2018

Are Abstracts Enough for Hypothesis Generation?

The potential for automatic hypothesis generation (HG) systems to improv...

Please sign up or login with your details

Forgot password? Click here to reset