MIReAD: Simple Method for Learning High-quality Representations from Scientific Documents

05/07/2023
by   Anastasia Razdaibiedina, et al.
0

Learning semantically meaningful representations from scientific documents can facilitate academic literature search and improve performance of recommendation systems. Pre-trained language models have been shown to learn rich textual representations, yet they cannot provide powerful document-level representations for scientific articles. We propose MIReAD, a simple method that learns high-quality representations of scientific papers by fine-tuning transformer model to predict the target journal class based on the abstract. We train MIReAD on more than 500,000 PubMed and arXiv abstracts across over 2,000 journal classes. We show that MIReAD produces representations that can be used for similar papers retrieval, topic categorization and literature search. Our proposed approach outperforms six existing models for representation learning on scientific documents across four evaluation standards.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2020

Document-level Representation Learning using Citation-informed Transformers

Representation learning is a critical ingredient for natural language pr...
research
04/15/2020

SPECTER: Document-level Representation Learning using Citation-informed Transformers

Representation learning is a critical ingredient for natural language pr...
research
09/13/2023

Beyond original Research Articles Categorization via NLP

This work proposes a novel approach to text categorization – for unknown...
research
11/23/2022

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

Learned representations of scientific documents can serve as valuable in...
research
06/28/2018

Peerus Review: a tool for scientific experts finding

We propose a tool for experts finding applied to academic data generated...
research
02/28/2019

Representation Learning for Recommender Systems with Application to the Scientific Literature

The scientific literature is a large information network linking various...
research
05/12/2022

SimCPSR: Simple Contrastive Learning for Paper Submission Recommendation System

The recommendation system plays a vital role in many areas, especially a...

Please sign up or login with your details

Forgot password? Click here to reset